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(1.) Introductory. 


IN a series of memoirs presented to the Royal Society I have endeavoured to show 
that the Gaussian-Laplace normal distribution is very far from being a general law of 
frequency distribution either for errors of observation* or for the distribution of 
deviations from type such as occur in organic populations.+ It is quite true that the 


* “On Errors of Judgment, &c.,” ‘Phil. Trans.,’ A, vol. 198, pp. 235-299. 
+ “On Skew Variation, &c.,” ‘Phil. Trans.,’ A, vol. 186, pp. 343-414. 
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normal distribution applies within certain fields with a remarkable degree of accuracy, 
notably in a whole series of anthropometric, particularly craniometric, observations.* 
In other fields it is not even approximately correct, for example in the distribution of 
barometric variations,t of grades of fertility and incidence of disease.{ For such 
cases I have introduced a series of skew frequency curves which serve the purpose of 
describing the frequency of innumerable skew distributions well within the errors of 
random sampling. An exact test for “goodness of fit” in the case of frequency 
distributions has also been now provided.§ 

In dealing with frequency which diverges more or less conspicuously from the 
normal law we require to bear in mind at least three important points :— 

(i.) Any expression for frequency must be a graduation formula. It is not a 
disadvantage, but a fundamental requisite that it should smooth off ‘‘Scheingipfeln,” 
so far as these are irregularities within the limits of random sampling. 

Hence formule like those provided by THreLE|| and Wunpr's pupils,? which depend 
upon taking enough “moments” to reproduce the complete frequency, are & prio 
fallacious. Many interpolation formule would do this completely, but such inter- 
polation formule are not graduation formule. 

(ii.) The graduation formula must not depend upon the calculation of constants 
having such a high probable error that their value is practically worthless. 

Now, the probable error of high moments and products increases rapidly with their 
dimensions ; hence there is, beyond the labour of arithmetic, a practical limit to the 
number of moments or products which can be effectively used in a graduation 
formula. 

(ii.) There must be a systematic method of approaching frequency distributions, 
which can be applied to all cases with reasonably practical ease. 

Now the immense majority, if not the totality, of frequency distributions in homo- 
geneous material show, when the frequency is indefinitely increased, a tendency to 
give a smooth curve characterised by the following properties :— 

(i.) The frequency starts from zero, increases slowly or rapidly to a maximum, and 
then falls again to zero—probably at a quite different rate—as the character for which 
the frequency is measured is steadily increased. This is the almost universal 
unimodal distribution of the frequency of homogeneous series. Homogeneity may 


* ‘Biometrika,’ vol. I, p. 443; vol. IL, p. 344; vol. IIL, p. 230. 

t ‘Phil. Trans.,’ A, vol. 190, pp. 423-469. 

t ‘Phil, Trans.,’ A, vol. 192, pp. 257-330; ‘The Chances of Death,’ vol. I., pp. 69, ef seg. ; ‘ Biometrika,’ 
vol. L., p. 134 and p, 292; and for disease, ‘Phil. Trans.,’ A, vol. 186, pp. 390 and 407; A, vol. 197, 
p. 159. 

§ ‘Phil. Mag.,’ vol. 50, 1900, pp. 157-174, and ‘ Biometrika,’ vol. I., pp. 154-163. 

|| ‘Forelaesninger over Almindelig Iagttagelslaere,’ Kjébenhavn, 1889; ‘Theory of Observations,’ 
London, 1903. 

{| Wunnt, ‘Philosophische Studien.’ A whole series of papers, by G. F. Lipps and others, seems to me 
to quite miss the point of (i.) and (ii.) above. 
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for practical purposes be taken to imply unimodality, although the converse is very 
far from true. 

_ (ii.) In the next place there is generally contact of the frequency curve at the 
extremities of the range. These characteristics at once suggest the following form of 
frequency curve, if ydx measure the frequency falling between x and «+x :— 


ly/de = Ye 
dyjdx = es PS: © ena ro): 
For in this case we have one mode only of the frequency, 7.e¢., at e=—a, and 


dy/dx will vanish when y=0. 

But the assumption of this form, as long as F(x) is general, is itself extremely 
general, and it includes cases in which dy/dx may not be zero, but take any values 
from 0 to «, when y=0.* ; 

Now let us assume that F(x) can be expanded by Mactaurin’s theorem, and 


equals b)+b,"-+-b,x*+-b,0°+.... Then our differential equation to the frequency 
will be 
SR a ace wa (ii.) 
Pe Oe Ope Ue Ue ee ee ee 


There is now absolutely no difficulty in determining the unknown constants in 
terms of the moments of the system. Multiply up and also by 2", and then integrate 
throughout the range of frequency, we have 
dy 


Y de =\y(w+a)erde . . . (iii). 


fo (bo +b, 2+ bya? + b5a?-+ . . Te 


Or, noting that y=0, at the ends of the range we have, with the usual notation for a 
total frequency N, we., 


Ny’, = |yade eer an et 

the result by integration by parts 
MD of’ na + (M41) Dyn + (+2) Dope nar + (M43) Oso ngs bee =H ay —ap'n (v.). 
Hence, if we write n=, 1, 2,3... s successively, we have s+1 equations to find 


a, bp, b,, by... b,-. in terms of the moments. For example, if we stop at by we 
require two moments, at b, three moments, at b, four moments, at b, six moments, at 
b, eight moments, and at b,_,, s>2, 2s—2 moments. 


* For example, cases in which there is a minimum frequency or antimode at « = — a, and dy/dx infinite at 
one or two values for which y=0, as in the frequency distributions discussed in ‘ Phil. Trans.,’ A, vol. 186, 
pp. 364-5, and ‘Roy. Soe. Proc.,’ vol. 62, p. 287, “ Cloudiness, a Novel Case of Frequency.” 


6 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 


There is no difficulty whatever in finding the b’s; we have the system of equations ; 
where p’)=1 


BG bOXD A pd Hep do t3pbst4p'sdit 2 6 2 Sp) 

Ba pl do 2p di + 3p abet 4y/sbst 5pydst . . . =p’, 

peop 2p bop Bp'gb + 4p’ sbyt Sp'gdyt 6p'sdyt . - . = — p's 

B's 3p "bot 4p’sd + 5p ybot 6p'sbstTpebst - 6. Spy 

pt Ap’ sbyt 5p'4d, +6p'sdo + 7p gdst Bu yd,t . 2 2 =p; 

fn Ve OOS ere ee . (vi). 
Hence, a, by, 0), bo, bs, .. . are at once given in terms of the determinant A and 


its minors, where : 


A=; Ho 9 Bo, 2m, Bp'o, p's, 
Bi Mo pe, Bpy'o, 4pls, Sys, 
Ho, 2m, Bp ln, Amls, Sply, 6p’, 
Bs, 3p'n, Ap's, 5yly, 6pls, Tp's, 
By Abs, Sply, 6's, Tyg, B8p'r, 


(vii.). 
The results may be simplified slightly by taking the origin at the mean, and the 
moments about the mean, indicating this by dropping the dashes and putting p’,=0. 
Thus we have the following series of frequency curves, the origin being the 
mean :— 


(i.) Keeping b, only 


Bayt: ime Gilpet aie ol de dlsew dey yh 
This is the Laplace-Gaussian normal form. 
(ii.) Keeping bo, b, only 
dy FB ute eee 
Z batyhe . 


This is the Type III. curve of my memoir on skew variation.* 
(iii.) Keeping bo, b,, b, only 
oo Ha (Mat Spa") 
nS dy _ vere - LO popty — 1819” — 125" my 
y dx Po (Apopty— 3413") Bs (Mgt Bp19") gt 2H aba Bpz—6p> 
1 Optaptg— 1 8p? — 12pug? " 10 prgpy— 18 p99 —12p1g?” LOpgty—18p1°— 12,2” 


(x.). 


* ¢Phil. Trans.,’ A, vol. 186, p. 373. 
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This equation gave Types I-VI. of my two memoirs on skew variation,* and 
provides at once the expressions 


o VB, (B,+3) (xi.) 
2 (58,—68,—9) als ; 


VB (B+3 ii 
Aiea ey sini ahr, Aoaesi.), 


where o = Vpo, By = ps2/po°, Bo = [44/49”, given in my memoir on the theory of errors 
of observation without proof.t 

There is no theoretical limit, however, to this process; we can from (vi.) and (vii.) 
express the a and b’s at once in terms of determinants, and expanding obtain forms 
which, like the formule of Turete, will fit closer and closer to the observed 
distribution of frequency, the more moments we take. But there are three fundamental 
practical objections to this. These are the following :— 

(a.) Experience shows that the form (x.) suffices for certainly the great bulk of 
frequency distributions, 7.e., it describes them effectively within the limits of random 
sampling. 


d = distance from mode to mean = 


skewness = 


If the distribution be even approximately normal, the series in the denominator 
converges very rapidly, for the coefficients of every power of x vanish for moments 
obeying the relationships :— 


Pog = 90, po = (2s—1) popte,-2, 


which hold for a normal series. 

(b.) The labour of arithmetic and of analysis becomes very great, if we desire to 
keep higher moments. If we go to b, we should have to calculate the first eight 
moments of the observations about their centroid—a by no means easy task. Further, 
the classification of the resulting curves and the criteria for the right one to use in a 
special case, although not absolutely prohibitive, if we only go as far as b,, are for 
practical purposes idle in the case of taking into account ),. 

(c.) The probable errors of the higher moments are so large that the values found 
for p47, Hs, &c., are quite untrustworthy, and even that for pw, is doubtful,t unless we 
have frequency series far larger than usually occur in actual observations. This is a 
strong argument against the utility of any descriptions of frequency, such as those 
suggested by TureLe or Lipps, which depend upon moments higher than the fifth 
or sixth. 


* «Phil. Trans.,’ A, vol. 186, pp. 343-414, and ‘ Phil. Trans.,’ A, vol. 197, pp. 443-459. 

+ ‘Phil. Trans.,’ A, vol. 198, p. 277. 

¢ In ‘Phil. Trans.,’ A, vol. 185, pp. 71-110, I have given a method of breaking up a frequency 
distribution into two normal series. I obtained long ago the criterion for determining whether such a 
resolution is possible or not. But it involves moments higher than the fifth, and the probable error of the 
criterion is thus so great that for practical purposes it is worthless, 


- 
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The question of the probable deviations of the higher moments can be illustrated as 
follows, by finding the standard deviation of the moment when we take a number of 
random samples from a general population. Let %,, be the standard deviation of p,, 
then 100 ,,/us is the percentage variability of 4, due to random sampling. The table 
below shows the increase of these percentages in the case of the moments of normal 
distributions, which, quite as well as any other, will illustrate the rapid increase in 
probable error as we use higher and higher moments. The general values of the 
standard deviations of some of the moments were first given by CzuBrr,* then 
far more completely by SHEPPARD,t and a résumé of all the results recently in 
‘ Biometrika. ’t 


PERCENTAGE Variability in Moments due to Random Sampling when the Series 
is supposed to be Normal. 


Moment. bi 500 in series. 1000 in series. 
is 6:3 4:5 
M4 14:6 10°3 
ig 30:1 21:3 
lg 60°6 49-9 


Precisely the same rapid increase takes place when we find the variabilities of the 
ratios p4/j19”, [g/p9°, s/o", &e., which are the forms in which the moments actually 
occur in our coefficients. In this case we have to remember that errors in the 
moments are correlated, but the correlations are given in the papers cited above. I 
find in this case the following series, which is almost as suggestive as the previous 
table. 


PERCENTAGE Variabilities in Ratio of Moments due to Random Sampling, the 
Series being Normal. 


Ratio. 500 in series. 1000 in series. 
Pal pa? 7°3 5°2 
Pe / p23 23°3 16°5 
[s/ pot 55-1 39-0 


The order of this increase of percentage variability, and therefore of probable error, 
is the same for skew as for normal variation, and it seems therefore, with the length 


* «Theorie der Beobachtungsfehler,’ S. 130, ef seq. 
+ ‘Phil. Trans.,’ A, vol. 192, pp. 122, et seq. 

{ Vol. IL, pp. 273-281. 

§ Ihid., p. 277. 
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of the series in customary use, idle to use the 7" or 8 moments; these have 
variabilities varying from 30 to 60 per cent. of their values, and accordingly we might 
easily on a random sample reach a 7" or 8" moment having half, or double the value 
it actually has in the general population. Constants based on these high moments 
will be practically idle. They may enable us to describe closely an individual random 
sample, but no safe argument can be drawn from this individual sample as to the 
general population at large, at any rate so far as the argument is based on the constants 
depending upon these high moments. 

It seems to me accordingly obvious that, bearing in mind the object of a theory of 
frequency (7.¢., the description of the distribution in the general population by aid of 
a graduated sample, agreeing with the general population within the probable errors 
of random sampling), we can dismiss from practical use all theories which call upon 
us to use moments as high as the seventh or eighth. Any use of the general form 
(ii.) beyond b,, indirectly or directly, involves such higher moments. Personally I am 
inclined to doubt whether the continental series using higher moments are, from the 
standpoint of graduation, nearly as good as my form (ii.). 

Hence we seem driven to the skew curves embraced in (x.) as a practical frequency 
series. If we have a frequency not described by (x.) we may, perhaps, use p, and py,* 
but it is difficult to see how its description can possibly be bettered by the use of 
still higher moments. This may seem a counsel of despair; but it is very far from 
being so in reality when we remember that (x.) has proved its efliciency now—I might 
almost say, without exception—in a wide range of economic, physical, biometric, and 
actuarial data. 

In this memoir on skew correlation I shall accordingly confine my attention, for the 
most part, to constants the discovery of which does not involve the use of moments 
or products of higher than six dimensions, judging all above this limit to be, as a rule, 
disqualified for practical service by. the magnitude of their probable errors. 


(2.) Generalised Idea of Correlation, 


Given any two variables or characters A and: B, we say that they are correlated 
when, with different values x of A, we do not find the same value y of B equally likely 
to be associated. In other words, certain values of B are relatively more likely to 
occur with the value 2 than others. The distribution of B’s associated with a given 
value x of A is termed an a-array of B's. If N pairs of A and B are taken, and n, of 
these have the character A = x, these n, form the x-array of B’s. This array, like any 
other frequency distribution, will have its mean, which we will denote by 7., and its 


‘order. 


* Referring to equation (ii.), I propose to call curves which stop at b, skew curves of the ¢ 
Thus the normal curve is a skew curve of zero order; curve of Type III. is a skew curve of the 1* order ; 
Types L., IL., V., and VI. are of the 2" order. I hope shortly to publish a discussion of skew curves of the 
3™ order to complete the practically legitimate range of such curves. 
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standard deviation, which we will denote by o,,. The mean of all the B characters 
shall be 7 and their variability given by the standard deviation o,. Similarly #, o- 
will denote the mean and standard deviation of the A’s, and n,, %,, and o,, the 
number of individuals, the mean and the standard deviation for a y-array of A’s. 

Now clearly a knowledge of 7, and o,, will not fix the B’s which will be found 
associated with a given A, but it will define the limits of probable or even possible 
B’s. The curve obtained by plotting 7, to x is termed the regression curve of y on «. 
A curve in which the ratio of o,, to the standard deviation o, is plotted to may be 
termed a scedastic* curve. Since the standard deviation is always a positive 
quantity, this curve always lies on one side of the axis; it is a horizontal line in the 
case of normal correlation—+.e., the Gauss-Laplacian distribution of deviations—and 
coincides with the axis, in any case where correlation passes into causation, 7.e., when 
one value of B only is associated with each A. 

The mean ordinate of this curve would clearly be a sort of general measure of the 
degree of correlation between A and B, but it seems for many reasons better to base 
our measure on the mean square of the weighted standard deviations of the arrays, or 


o2=S8(Meo2)/N. . . . . . sss (xiii). 


o,, Will thus measure the average variability in B to be found associated with any A, 
its vanishing will mean that the scedastic curve as defined above will coincide with 
the axis. Now let a new quantity », defined by 


0, =(1—7') 0, & oomontes See 


be introduced. Then clearly 7 must lie between +1, because oy,” cannot be negative, 
being the sum of a number of positive squares. I term 7 the correlation ratio, to 
distinguish it from the correlation coefficient represented by r. When n=-+1 the 
correlation is perfect or we have causation. Further we have by a well-known 
property of moments, if 

Cn, = SiNs (Yn IP LN «ee ae a 

o,2 = 042 +0»,, 
or 

1) Om,/ Oy i) iar. os) str ys: tee Ca ene Cree 


This shows us that the correlation ratio is the ratio of the variability of the means 
of the x-arrays to the variability of B’s in general. If y=0, it follows that o», is 
zero, or from (xv.) that every y,,=¥, 7.¢., there is no association of B’s with special 
A’s at all, or correlation is zero. Thus the correlation ratio 1, as defined by either 
(xiv.) or (xvi.), is an excellent measure of the stringency of correlation, always lying 
numerically between the values 0 and 1, which mark absolute independence and 


* I... a curve which measures the “ scatter” in the arrays. 
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complete causation respectively. Further, remembering the definition ot r, the 
coefficient of correlation, 2.e., 
Nowa, Xr = S{n(e—2)(y—9)}, 


= S{n,(t%—Z)(y,,—g)} . . . «.. | (xvii), 
we have, from (xv.) and (xvii.), 


N (7? —1*) 02 =S8 E (Ynn—I){Yn.— J = (e—2)}] ; 
Now let : 


Y=j+"2#(x—@) ose teeth Pein s Gevial-). 


then (xviii.), as is well known, gives the best fitting straight line to the series of 
points ¥,, loaded with their respective n,. We can now write 


N (n° —7") 0)? = Sins (Yn. Y)P} + Sinz (Y—J)(yn,— Y)}. 
But, using (xviii), 


S (n-(¥—J) Yn -¥)} = 2228 | ng (@—8) {yn — J —"24(v-@)} | 


ro ro. 
=~! (Nrowy — Now’) , 
Ox Oz 


== (0s 
Thus the last summation vanishes, and we have 


INA ptgeris | Cet Aig (Uy al \ opi cod ss eee #7 os UXT, ). 


The right-hand side must always be positive, unless y,,=Y, when it is zero. Hence 
we conclude that y is always greater than 7, or the correlation ratio greater than the 
correlation coefficient, except in the special case when the means of the «-arrays of y's 
all fall on a straight line, 2.e., we have linear regression, and then the two correlation 
constants are equal. | 

Thus the expression (n*—7r*) a,” has an important physical meaning ; it is the mean 
square deviation of the regression curve from the straight line which fits this curve 
most closely.* We have now freed our treatment of correlation from any condition 
as to linearity of the regression, and it remains to consider the probable errors of the 
various quantities dealt with. 


(3.) Probable Errors of Constants of Correlation. 


We shall first prove a number of general propositions relating to the probable 
errors of correlation constants. We first note that if n and n’ be the frequencies in 


* The properties of the correlation ratio were briefly noted in a footnote to a paper by the author in 
‘ Roy. Soc. Proc.,’ vol. 71, pp. 303-4. It has been systematically used in my laboratory for some years 
and determined longside r for many distributions. 
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any two sub-groups of a total N, for which no member of n is a member of n’, then 
the standard deviation of n due to random sampling is given by 


,=n(1— 5) ie eee es | (XX:), 


and the correlation between deviations in n and n/ due to random sampling is given 
by 
/ 
hott PP eee es 


Problem 1.—To find the correlation in deviations due to random sampling between 
the number nz, in the x,-array of y's and the number n,, in the y,-array of x's. 

If the symbol 6n denote the error or deviation in n, we have with an obvious 
subscript notation*™ 


Siz, = 8Nz,1,+8N2,y,+ONzy,+- «+ ONx,y, 
if there be g groups of y’s, and again 


ONy, = Oz, 4,752, +52 y +.» tONay,, 


if there be 7 groups of a’s. 
Multiply the expressions for 6n,, and dn,, together and we have 


SN; ONy, = (ONzy,)" +S (S12,y,5%2,y,)> 


where the summation is for every pair of values of u and », differing from s and p. 
Summing all such pairs of values for every random sample and dividing by the 
number of samples taken, we have the usual definition of correlation 


Nay. Nag Moy \ . 
En, En =p" Ve = Nagy 4 (1 ca mest) in ‘ (Besdtew 5) 


or, 
NzNy, 


N 


Lite, tty Petey ey (xxil.). 
This gives R,,,,,, the required correlation, since 2,, and %,, are known from (cr, 
Problem II.—To find the correlation between deviations in the total n,, of any array 

and in any sub-group Ney, of this array. 

We have at once 

S22, ON 2y,= (Sey) +S (8%2y,52%2,y.) 

where wu is to be taken every value other than s in the summation term. Summing 

for all random samples and dividing by their number, we have, after using results 

like (xx.) and (xxi.), 

Ru, ey, x Qin, Dey, = Ney, (1 a "es TEES BSN Rhee, SLs (xxii. ), 


which gives R,,, »,,. 
Pvp 


* nzy=frequency of groups with characters z and y, 
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Proposition IIl.—There is no correlation between deviations in the mean of an 
x-array Y., and the total number in that array. 
(peed pate Gees 
Nz, 0Yr,=S (ONy.Yu)—Yx, Nps 
Nc, Yx, Nx, = —Yzx, (Six, +S (SN2, 5%2,y.Yu)- 
Hence as before, using (xxiil.), &c., 


Ney dy..2nz, Ry, nz, el Yate, (1 Ti a) a 8 {ath ( 7 N mf 


=0, 
which proves that Ry, ne, is zero. 
Proposition IV.—There 1s no correlation between deviations in the mean of an 
x-array and in the total number in any other array. 


Proof as before. 
Proposition V.—There 1s no correlation between deviations in the mean of one 


x-array and in the mean of a second x-array. 


We have 
Ne, SY, ==. (Sry.Yu) —Yz, SNe,s 


Nzy Yr = (SMe y Yu) — Yes May 


Multiply these two expressions together, sum for all random samples, and divide 
by the number of such samples. We find 


Fan ae Ne 
Nz,Nz,! Zy,, y.) R,,. man Laas, =e 


+28 dane N 
HY ayS! (Nay MeyyYu)/N 
SS (Mz,yMzyy.Yx)/N 
=F! (Magy MayycYulyu)|N 


NeNet 


=—Y2.Yr— on TY. Yr, 


Nz, 


Mangan yy S(NyYu) XS (MayyYw) 
NUT Hee N 


+Yz) 


The last term is ““»/* <a Y=! and thus the right-hand side is identically zero. It 


thus appears that there is no correlation between errors made in finding the means of 
two arrays. This result is not at once obvious, although a very little consideration 


shows it must be true. 
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Proposition VI.—To prove that the standard deviation of the mean y,, of any 


n 
hes 


, oe 
z-array due to random sampling equals 7am 
Lp 


We have 
Nz, 8Yx,= SB! (SNz.y Yu) — Yep Nz ye 


Square, sum for all random samples, and divide by the number of such samples. 


We have 
2 Ne 
Ne, dy, = Yrs (1 _ res) —2y2,9 {may.(1 _ ree) yah 


+8 {Pa (1 — "e:) y.?} 
—28 ee oe “ yathe} 


peer ve : Lh pee 2 it 
=Yoine,(1 | 2ya3n.,(1 wy 


+8 (Mr,y.Y Z ra S (Nay Dh (Ney/Yu) 
=5 (Rey w) —N2Y>,” 
S170 n,.” 
Hence ae 
i Tela te SOO Bere 


Thus the probable error of the mean of an array has exactly the same form as the 
probable error of the mean of a random sample of a definite number of individuals. 
The array may have a variable number of individuals, but we have seen in 
Proposition III. that there is no correlation between errors in its mean and errors in 
the total number of individuals contained in it. 

Problem VII.—To find the probable error of the standard deviation of any array. 

By a precisely similar investigation to that of the previous proposition we find 

3 A/a os ee ce 
*p 4nz,Mg 
where 


iiigas 


1 
—§ a—Yx i z, » 
Ne, vy ¥ ») nN. yet 

This is identical with the probable error we should have if the array were a random 
sample of constant size. 

In many cases it will be sufficiently approximate to put m,=38m,? and we then 
have 


on, . 
‘67449 3 67440 ee ty ee 


V/ 2Nz, 
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the well-known form for the probable error of the standard deviation of a normal 
distribution of a definite number of individuals. 

Problem VIII.—To find the standard deviation of the standard-deviation oy of the 
means of the arrays due to random sampling. 

Since 


Noy=S {t2, (YJ) 
2Noy Soy=S {82, (Yx,—J)} +2 {5y2,22, (Yx,—J)} —28Y {M, (Y2,—-J)$; 
the last term of which vanishes, since 
Ng=S (nz,Yz,)- 


Square the above relation, sum for all random samples, and divide by the number 
of such samples. 


We find 
AN oy°,,'=8 {m,, (1-2) (ys, I) 

—28 {Ba (Y2,—J) (y-.—9) | 
+48 {2,,2),Bu.y, YI) } 
+48 {21,27 Buy, Yor —I) (Yx,—-J) 
+48 $3), %),By,9, (Ys—J) Yer) 
+48 {Sy,,?nz,? (Yz,—-9)}- 

But R,, oe Ry 9.) and R,, ” vanish by Propositions III., [V., and V. Further, by 


ig Ly, = On,” [Ney Hence we have 
4NPoy? d= S {Mn (1 =") (y..-9)| 
a 28 ee (y2,—9)° (Yo, —Yy } 


+48 {Nz n.,” (Yz,—9)"} 
} = [S {Nz (Yz,—9) tP 
N 


= Ss {Nz,(Y2,—9)* 
+ 45 {N20 n,," (Yo, —gy j . 


Nd\,y=S {Ne, (Yz,—9)"} 


be the n™ moment of the means of the arrays about their mean. Then clearly 


Now let 


\,=oy”. Further, since § (12,0 n,,”) = No,?(1—7’), we can write 


S {1.04.2 (Y,—JP}=Noy (1-1) 08 XxX1 
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where x, is a purely numerical constant, which is equal to unity for those cases in 
which there is no correlation between the standard deviation of an array and the 
square of its mean’s deviation from the mean. Thus finally we find 


~) 9 r anit r 2 O 2 (1 nee =) oe 
SS 22s 2 PVE Mats Es DY, XxXV 
os aN + N eee. 4. RENT.) 


This enables us at once to find the probable error of the standard deviation of the 
means of the arrays. 

Proposition 1X.—To find the correlation between the demations due to random 
sampling in the values of o, and oy. 


We have 
No/=S fn, (y—9)"}, 


2No,50,=8 {dn,,(y¥s—9)"} —2 dg Sin, (Y—J)} 5 


the last term vanishes because S (,,y,)=N 7. 


Thus 
2Na,Sa,=S dry, (Ys—J)"}- 


But from the previous proposition 
2Non Sou=S {di2, (y2,—9)°} -§ 258 { OY 2,Nz, (yz,—9)}- 


Multiply these two expressions together, sum for all random samples and divide by 
the number of such samples; we find 


4AN*oy0y%0,20,Ro,o4= S { Ly 2s, (y;—Jg) (Y2, —y) Ry,n.,)3 
ar 28 { Nz, 2ny Dy, Ruy, ys, (y¥.—J9) (yz,—9) § : 
To evaluate this, we require to find the two correlations expressed by R,,,,, and 
R,,y.- We will consider the two summation terms separately. 
First Term.  8nz,=8tzy, +6Nzy,+ -.. bOMay +... 
Sty, = SNy,c, + Oyc, +... Oy e,+ --- 
Oty, Ny, =(8Nay,)P +S (S224 5Nxyy,)> 


where in the summation p’ and s’ are not equal to p and s. 
Proceeding in the usual manner we find 


Nay. Nay Mary, 
2 ne, Xy, Rng ng, = Magy, (1 — ae) — S ' N Y 


S (May) XS (nay) 


= Nay, — A 
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where in the first sum s’ is to take all possible values, and in the second p’ is to take 
all possible values. Thus we have 


Zi, my, Bong ty, = Nagy, — ee a nee VIL: J. 
Substituting we find 
First Term = 8,{t»,(ys—J)? (Yer) 
8.4" (v.—I) (Yas If. 


Here both the summations are really double summations; fixing our attention on 
any @,, 2.€., on any array of y’s for a given value of a, we have first to sum for all y’s 
in this array, and then we have to sum for all arrays. This is the meaning of 8,. In 
S, we are to associate every array of «’s with every array of y’s; hence this term will 
break up at once into two factors, 7.e., 


1 é a 
WS (Me Yar IP} XS fry (Ye "3 


=o X Sins, (Ys,-9)"3 
Ey xX On. 
Keeping «, constant first in §,, we see that 
S{mzy, (Ye J)" 
is the 2": moment of the y’s in the «, array about the mean of the system 


Sontag { On,” ai (Yz, —9)3 7 


Combining we have 
First Term = 8{n,,(ye,—9)8 +8 {00,2 (Yo, —Novowt 
= N{\,+0,20x? (1—7’) x; —oy?ou?} - ee Ce xIX.). 

We now turn to the second term which involves the discovery of R.,,, ,, . 

Sy, OY, = (SMy,2,FOMy x,» +OMyz, + ++) SYz, 

Mz, BY, = — Ys, 52, FS (8Ms,yYu): 

Nz, Ny, OYz,= —Yz, (ONye, + OM yz, + +++ + ONy2,+ -» +) Orz, 
HF (SM ypc, F OMe» FEMye, A «+ -) S (SMe y Yu): 
Sum for all random samples and divide by the number of such samples ; we have 
Nz, Zny,Zye, Pony ye, = — Yay (nay — Nets) 


ok Nery, Ys ont B leet.) 


Set mea ster of Lilie al ° (xxx.). 
Cc 


Hence 
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Substituting we have 
Second Term = 28 {zy (Ys—Yz,) (Ys IJ)? (Yn — J) }- 


Here again the summation is of a double character. 

Let us-first take x, as constant and sum for every value of y,. We may write 
Ys—IJ=(Ys—Yx, $+ Yz,— J), and our first summation will be 

2 (Y:,—9) x S [Ray, {(Ys—Yz,)° +2 (y,— Yu,” (Y2,—- I) +(Ys—Yz,) (Y2,—9)} | 
=2 (Yap J) NajMis +4 (Yo, J)? Meeftg +2 (Ys, — IJ) 8 (Magy. (Ys Ye)3 
Ny,Mg 7 S {Ray, (Ys —Y2,)!. 
The last term vanishes for 8 (7:,y,ys.)="z,Yz, by the definition of the mean. 
Hence 
Second Term = 28 {n.,mz (Yx,—J)} +48 {Ne,On? (yz,—9)}. 

Here m, is the third moments of the x, array of y’s, which will probably be very 
small if the arrays are nearly symmetrical and the first: term clearly depends on the 
existence of a correlation between the skewness of the arrays and the magnitude 


of their means. 
We may write the first term then : 


= 2No,,°ox X Xo 


=2Na,’ (l—7°)"" on X Xo 


where x, is a purely numerical quantity, which for most cases will probably be very 
small or even zero. 


Thus we find: 
Second Term = 2Na,?° (1 —7’)” OuXo+ 4No,7ox 3 (1 —7*)x, -  » _(eExL). 


We can now return to p. 16 and write down the full correlation between deviations 
in the values of a, and oy due to random sampling. Remembering that oy=70,,* 
we find: 

bp Reve =- : | +7°o. 4(1—7?) x. 1) | 
AB OM i Ny ior? ? 


1 | 2 
+5 07 (1-1?) xe 560 (1-2) 


o; {Hon . 
=F eat lea taf 2 ivo.gpveeli A amma). 


* It should be remembered that this definition of 7 gives it invariably the positive sign. 
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Proposition X.—To find the standard deviation of the values of the correlation 
ratio » due to random sampling, i.e., to find the probable error of the correlation 


ratro 1. 
We have 
n= 04/4. 
Hence i ‘ 
oy — oom doy 
n om Cy 


Squaring, summing for all random samples and dividing by the number of such 


samples, we have : 
2. : Set por 220,204 euey 


2 2 2 
n On Cy OyuTy 


. . * oe —_ 2 
%.," 18 given (xxvii.), 2,,2.,R..., by (xxxii.) and 2S Py Pe" by a well-known 


be: 
formula,.* 
Substituting, we have the complete value of %, given by : 
2,” ot 4X1 oat L pHs! 
lige sm fie fs ayy ey DeryeyeN 
ON 1,27 an (1-1) Xt oN Ny 
or, after re-arranging, 
z= {aay Pts ae. 4° (1— 2m’) 
ay are ieree ssoeteenip bs 2st 


For normal correlation, w,=3y,”. Further 


Ye =". ©! (&p—8), 
and 
N= {MY I= ES, (tp 2)4) 


ae 
=0s x N3c,t=3Nr;* 
Cx 


Hence the second and third terms vanish. Further y,;=1 and x,=0, while n=r. 
Hence we have 


esr ea Ue 
Satin, 


which agrees with the special result. 


* ¢ Biometrika,’ vol. I1., p. 276. 
Cc 2 
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In any other case, x2, XY; —1, (f44—3pg”)/prg”, (Ay — 3Q”)/A,” will probably be small and 
thus 
1 
322 (1a), 
Probable error of 
n='67449 (l—7)//N, nearly . . . . . . (xxxiv.). 


This simple form suffices for many practical cases. 

If greater exactitude is wanted, there is, however, no great labour in using 
(xxxu.). We find the means and standard deviations of each array. 

Then Nd, and Nd, are the 2™ and 4 moments of the means of these arrays 
about their mean. 

Np, and Ny, are the 2™ and 4 moments about the mean of the y-characters, and 
will always be known for skew variation. 

X) 1s defined by 
8 (aces Gag Phe ag lene (xxxv.), 

No? (1—n') ow 


X1 


and can be easily found when the means and standard deviations of each array have 


been found. 
The most troublesome expression is x2 defined by 


S {ns (Ya_—J)3 . 
X= Nolo: 1 (xxxvi.). 


But as we do not take usually more than 10 to 20 arrays, the discovery of their 
3" moments is not an extremely difficult task. As a rule, however, x, is very small 
and may be fairly neglected, even when we must find x,—1. All these points will 
be dealt with in the numerical illustrations given later in this paper. At present 
we note that the probable error of 7 has been determined, and that its value for the 
general case is not really more complex than the value of the probable error of 7 in 
the general case, which requires the determination of product moments of the 4” 
order.* 


* Let Npgs=S {nay (« — £)% (y—¥)*}, then the probable error of 7 is given by 

so! J Pa Spy’ _ pa 3p20Po2 P40 — 8p20” Poa 3P02” Psi - 3Pupr Pis— 3P~urPo2 \ is 

x{ Pu? 2po0Po2 Apa? + Apos? Pup Pu por ‘a 
This agrees with the value given by SHEPPARD (‘ Phil. Trans.,’ A, vol. 192, p. 128), except that the 7? 

factor has been dropped by a printer’s error in his paper. For the special case of a normal distribution, we 

have easily from the equation to the normal surface 


A P40 = 3 pa", Pos=3P02", psi=3PuP20, Pis=3PuPo2x (P22 - 3Pu)/pu? = (1 - 1)/7? 
an ‘ 


Pais 3 P20Po2 a F liner | —_ 
“Spapee oe 1, whence 2,=(1-7°)/ JN, 


the well-known form (‘ Phil. Trans.,’ A, vol. 191, p. 245). 
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(4.) On the Higher Types of Regression. 


We have already seen how the introduction of the correlation ratio y enables us to 
drop the limitations associated with the Gauss-Laplacian form of frequency, and the 
Bravais correlation formule. The fundamental step towards this advance was 
undoubtedly taken by G. U. Yuue in his paper in the ‘ Roy. Soc. Proc.,’ vol. 60, 
pp. 477 et seq., wherein he shows that if the regression be linear, the Bravais type of 
formula applied to multiple correlation is still true, although we make no assumption 
as to the form of the frequency surface. It would undoubtedly be a gain to have 
skew frequency surfaces which would describe skew correlation for the great mass of 
cases as effectivly as the series of skew frequency curves describe skew variation, but 
although a considerable amount of progress has been made in the consideration of 
these surfaces, their full theory has not yet been worked out owing to difficulties 
of analysis, and their complete discussion must still be postponed. YuLe’s method 
of approaching the problem from the form of the regression curves is, however, 
available and capable of very great extension. Its chief advantage is that it 
makes little or no assumption as to the distribution of frequency ; its chief defect 
lies even in this advantage of generality: it does not enable us to predict the 
probability of an individual with a given combination of characters. This follows at 
once from the fact that we make no assumption as to the form of the distribution 
within an array. Without some theory as to variation within the array, we are 
reduced to the laborious process of calculating the standard deviation, skewness, and 
other general characters of each array, a lengthy and troublesome process compared 
with a theory which would, like the Bravais theory, give these at once in terms of a 
few constants determined from the data as a whole. 

In the great bulk of biometrical and economical enquiries, however, the regression 
does not diverge very markedly from the linear form. In the cases of non-linear 
regression that I have hitherto had to deal with, I find that parabole of the 2™ 
or 3™ order will suffice as a rule to describe the deviation from linearity. If 
they did not, we could, of course, use curves of higher orders, but the difficulty 
referred to in the first section of this paper at once arises: we then need to use 
in the determination moments and product-moments of such high orders that the 
probable errors of the constants are so high as to render valueless their calculation 
from such statistical data as we can hope for in most actual inquiries. In the great 
bulk of investigations it is practically impossible to increase our random samples 
from 500 to 1,000 individuals up to 50,000 to 100,000. Nor in the great 
bulk of statistical cases is any such increase even desirable, for a fairly wide 
experience shows that 2™ and 3™ order parabole amply suffice to describe the 
skewness of the regression line. I shall accordingly classify skew correlation in the 
following manner :— 
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(a.) LInnear Regression : 


The mean of an «x-array of y’s, 1.e., Yz,, is given by 


Yu, =A Tbe. oe ek | (XXKVAEe 
(b.) Parabolic* Regression : 


The mean of an w-array of ys, 2.€., y:,, is given by 


Ys, = Uy Piet Oakly” 5 5 5 6 + «) AMMRIX,). 
(c.) Cubical* Regression : 


The mean of an w-array of 7s, 2.¢., Y:,, is given by 
Yr,= Ay +O, %, +A 0+ age? (xl). 


It is conceivable—in fact, from unpublished work already done, highly probable— 
that the theory of skew variation will give regression curves, not of the exact form 
involved in (xxxix.) or (xl.), but containing product terms in x and y. The most 
general equation to a regression curve may be taken to be of the type 


Yx,—J= f (Xp—#), 

and what experience shows us is: that for the great bulk of vital phenomena it is 
sufficient to expand by Maciaurin’s theorem and keep the first three or four terms. 
Indeed, in the large majority of cases, (xxxviii.) alone suffices. Hence, if (xxxix.) 
or (xl.) fit the data within the limits of random sampling, we are not injudiciously 
circumscribing future developments of the theory of skew correlation by casting our 
regression curves into the above forms. I shall deal first with the theory of cubical 
regression, for we can then obtain from this the conditions necessary for parabolic 
and linear regressions. 

I must remind the reader, however, that the form of the regression line does not in 
any way limit the nature of the distribution of the array about its mean; the 
variability of an array, z.e., the standard deviation of an array, having for its mean 
value o,./1—y?, may or may not be the same for all arrays. If it is the same, or all 
arrays are equally scattered about their means, I shall speak of the system as a 
homoscedastic system, otherwise it is a heteroscedastic system. The Gauss-Laplacian 
correlation surface gives a homoscedastic linear system. Mr. Yuur’s linear regression 
is not necessarily homoscedastic; it may, however, be homoscedastic without being 
normal, and then the scatter of each array is measured by o,./1—7r*. When a 
system is homoscedastic, but not linear, then o,,°=0,?(1—y*), and consequently the 
x, of (xxxv.) is equal to unity. y,=1 is a necessary result of homoscedasticity. 

Lastly, we want a word to express the idea of all the arrays having equal skewness, 


* « Parabolic’ and ‘cubical’ are here used in the narrower sense of regression curves corresponding to 
ordinary parabole of the 2" order and of the 3" order respectively: in both cases the axis of the 
parabola being parallel to the axis of the y-character. 
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or being asymmetrical in an equal degree about their means. I shall express this by 
the term homoclitic ; generally the arrays will not be equally asymmetrical round their 
means, and in this case we shall speak of them as heteroclitic. If there were no 
skewness in any of the arrays, then m, of (xxxvi.) would be zero for all of them. 
I term arrays of no skewness isocurtic, and skew arrays allocurtic. If we supposed 
that a curve of Type III. would sufficiently express the skewness of an array, we 
should have 


Sk. =3m3/o,,,°, 


28 (2, Fn, (Sk) (Ye, -J)$ 
Sih No,’ (l—7")""ou i 


and therefore from (xxxvi.) 


(xli.). 


For a homoscedastic system we have o,, =o, /1—7”, and therefore 


— 2S {ie,(Sk-) (Ye, D} 


X= 
and for a homoclitic system = 
_ 2(8k.) S{72,0%,3 (Yx,—-J)} 
ae No,’ (1 —7°)o: M 
For a homoclitic homoscedastic system, whether isocurtic or allocurtic, 


— 2(8k.) S{m2,(¥,—-D)} — 9. 


Oy 


x2 


Thus x, is to a certain extent a measure of both homoscedasticity and homoclisy. 
But as the correlation between o,, and ¥,,— 7 18 in most cases extremely small, while 
the skewness of the array can well change its sign with arrays above or below the 
mean, we can fairly consider the smallness of x, to be a measure of the approach to 
homoclisy. Iam thus inclined to speak of y,—1 and x, as measures of heteroscedasticity 
and heteroclisy. When they both vanish we have a homoscedastic homoclitic system. 
For such systems 7, the correlation ratio, tells us effectively the scatter of any array, 
and as a rule all we want to know, in addition, is the form of the regression line. 


(5.) Cubical Regression. 


We have already used the following notation 
Noe eatin (ee p ee eee... 2. (xh), 


We shall shorten our formule if we write 
T=py[(o.0y), €=Py/(FPey), $=Pa/(o2'oy), P=Pua/(ee'oy) . (xhiii.). 
We have already used p, to denote p,,, and we shall use v, for py. Further, we 


write 
B,=v2/v,8,  Bye=4/v2,  By=v5rs/vq', By=ve/r2?- . « ~ (xliv.). 
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V/ B,=v./0.5 will be of the same sign as v;. These constants B have been previously 
used in the theory of skew variation.* 


We shall further put 
é=e—r/B, f=l—rB,, 0=0—rB,//B, 2 aug mmmmtted @ ks » 


The regularity of the forms €, @, 0, is rather screened by the above notation, which 
is introduced for brevity ; using the p,,/ notation, we have 


ga Pu Pansies t= Palo PuPs, G=PuPo—PuPso | . (xlvi.), 


4 
Oz Oy ox Oy o, 0, 


whence the law of formation of these constants is easily seen. 
The regression curve may now be conveniently put into the form 


Yo I aby 4b, F $b, (sel +), Ge) She eee 
Oy ue 


\ Oz x 


Or, multiplyimg by 7,, and summing for all arrays, 
= Nb + b,N + b,NW/B, ’ 


the sign of \/8, being always that of the 38 moment. Hence, measuring from 
the means of the two characters, 7.¢., X,=2,—2, Yu,=Ys,—J, we may re-write (xlvii.) 


Y,,/o,=0, (Xp/o-4)+b.{(Xp/o2)?—1} +b {(Xp/o.2—V/ Bi} . . (xlviii). 
Now multiply by 7,,X,/a, and sum for all arrays, remembering that 
Nro,o,=8(n,,XY)=S8 (n,,X,Y,,), 
we find al 
r=b, +b / B, +)3B>- 
This enables us to get rid of b, and write (xlviii.) 
Y,,/0y=1Xp/o2+byf (Xp/o2)’— VB, (B)/o2)—1} 
+b3{(Xp/o2)’—By(Xp/oz)—-V Bi} « ~ + (xlix). 
Now multiply by 7,,(X,/o,)? and sum for all arrays. We have 
=r VB, +), (Bo—B,—1) +3 (Bs/ VB, — Bo VB, — V Bi); 
ExBathirtbehy: 1: alanine. clieeule laden eee 


$,=B,—B,—1 i : 
Vt ae RE eS (li.). 
¢;=(B;—B,B.—B,)/ / By 


* «Phil. Trans.,’ A, vol. 186, p. 368, and A, vol. 198, p. 278. 


or 


where 
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Eliminating b,, we can write (xlix.) 


Y,,/oy=" (Kilon)+s (X,/o2)"—W/B, (X)/o.)—1} 


+y| (S/o) Bs (Xfa.)— VB, 8 i(X,Jo.) VB, (X,/o.)—1} | . (lit). 


Now multiply by n,, (X,/o,)* and sum for all arrays; we find 


C=rBy+ x bs +b, (dy— 3/9), 


(Lb, —€s)/ (Popy— ps") = bz eet ee Wont e ers ALi), 
Ghee Bie OF ie Salles Viet Beit Bee el ene (liv.). 
by=(€b,—Los)/(Prbi- $s?) (lv.). 


We can thus write the cubic regression curve in either of the forms* 


or 
where 


Tt follows from (1.) that 


* The method is perfectly easy of extension, if we choose to use higher products and moments, to a 
regression curve of any order, ¢.9., 


¥x,/%y =bo +h (Xp/or) + by (X)/oz)? +...+)y (Xp/oz)" Se NCS 


For let: Neg =S (teyYx,Xp1)/(oxIay), and ye=Ve/oz*=S(nz,Xp*)/(No,’), 
we have: O= bh +0x by + be + yabs ae ee a5 Ynbn =F 
en=Oxbot+ by + ysbo + ysbg + 1 . . + Yn+10n + 
€2) = bo + yshi + yabe + yobs =| A 4c + Yn+2n + 
= Ypbo + Yp+ibityp+2et+yYprsbst © 6 6 + Yntybnt 


Hence writing «9 for 0, yo=1, y1=0, yo=1, we have 


bn = (€01 Aon + €11 Ain + €21 Aon +. » + €pi Apnt.- -)/A, 


where A =| Yo Yi Y2s Y3» aka AY Yn 
Dat) Y2s Y3» V4» ed fer Fe Yn+ly 
725 Y39 V4 Y5s ge, ee Yn+2 
Yoo Yptis Yp+2 Ypt8s + «© = Yp+m 


and Ay, is the minor of the constituent in the (g+1)" row and (n+1)" column. As we have already 
noted, however, solutions involving anything beyond y¢ are hardly likely to be of practical value. 

The value above for }, is the type equation given by the method of least squares, when we strike the 
best fitting curve to all the entries in the correlation table. I have already pointed out that the method 
of moments becomes identical with that of least squares, when we fit parabolz of any order (‘ Biometrika,’ 
vol, I, p. 271). The retention of the method of moments, however, enables us, without abrupt change of 
method, to introduce the needful », and to grasp at once the application of the proper SHEPPARD’S correc- 
tions. The extension of the method of least squares to continua in space has not yet, as far as I am aware, 


been fully considered. 
D 
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Y,,/0y=7 (X y/o) {(X,/o2)?— VB, (X/o2)—1} 


4 bbe — €bs ae oo /B—?s ae a ee vi 
+ RS | (Kylo.) By (Kola) —v Bi~ $4 Kyla.) — VB (K/a.)—1}] . (Iv). 


or 


Ya Joy=r (Slo) ae =e, {(&,/0,)*— VB; (X)/o2)—13 


- ft, ((X,/0) Bs (X,/o.)— VB} » » (Ivi) Bis. 
The former arrangement of the solution, while it is apparently more cumbersome, 
is, perhaps, the better, for it gives us at once the measure of the deviation from 
parabolic or 2" order regression, 7.e., the approach of €4,—é€¢, to zero. In the case 
of normal correlation both € and { vanish, and neglecting higher terms the condition 
for linear regression is that €=0, and (¢6,—é/,=0, or, again, € and £=0. For 
material in which the w-variability is isocurtic, 8} =$,=¢,=0, and the regression 
curve takes the simple form 


Yu fay=0 (Xplo.)+ {Kyle} +4 {(Kp/o2) By (X a PAVING Tt. 


We now turn to express these relations in terms of the correlation ratio 7. 
Multiply (lvi.) by n.,Y,,/o,, and sum for all arrays, we obtain 


aoe ee e—4/ Be Lho—ehs on ae 
7 ta /Byr) + * bb.— ae eal By "A V Bi), 


whence results 


$2 (9° —-7°) -—P= (€b.—&5)?/(dody— Wy tees Ls 


(lvii.) is a necessary condition of cubical regression. 

It is of course not a sufficient condition, as we ought to show that ),, b;, &c., all 
vanish, and thus any number of conditions may be found. For example, multiply by 
Nz,X,p*/o,* and sum for all arrays, then 


a = ls 4 bb.— €p3 B>—B,B3— BiBo lviii. 
Deer act aE a a's EET = col 


is also a necessary condition. Here B,;=v,v;/o,!°. But the high as well as complicated 
value of the probable errors of such expressions renders it ‘dls to consider them in 
practice. 
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Substituting (lvii.) in (lvi.) we have : 
Yel oy=" (Xplo.)+g (x0) 02) —V Bi (Xn/o2)—1} 
ee) ae ais 
& Af B 7 El (X,f0,)?—Ba xnlo2) VB 
—$4(X,Jo,-VB(XJo.)—1) | (lis). 


Which sign is to be given to the root will often be visible on inspection of the 
observations. Otherwise the sign of the root must be the same as that of 


bh.— kbs, 
(lix.) will save the calculation of Z if the root-sign can be found by inspection. 
Finally there is a third form into which we may put the cubic. Eliminate ¢,¢,—¢4;° 
from (lix.) by aid of (lvii.) and it becomes 


eben Xy/o.)+ 5 4 ee Aes ie aa BidSalop) I 


aes —”)-—2 hi ied eos ; 
ep meg, Seles) Bs (Seles) —v B}\- \. » () 


At first sight this might appear to be the best form of the cubic, because it does 
not involve the 6 moment of the variable x But this is very far from being the 
case in actual practice. The reason is simply this, é, £ and »?—r* are in most cases 
very small—they vanish in normal correlation—relatively to ¢, and ¢, Hence both 
numerators and denominators of the coefficients of the square and cubic terms are 
the ratio of small quantities, and accordingly subject to large probable errors. For 
this reason (lx.) was found in actual practice to be of no service. Of the other two 
forms (lvii.) and (lix.), which neither suffer from this defect, ¢,¢,—¢4,;° being always 
large relative to the numerators, (lix.) while involving a 6™ moment does not 
involve a 4™ product, % and experience shows that the former is on the whole 
easier to determine and more exact than the former. Hence (lix.) seems the prefer- 
able form, even if it be needful in certain cases to determine ¢ in order to fix the 
sign of the radical. The cubic regression curve thus demands a knowledge of the 
correlation ratio 7, of the “cubic product” € and the sign by inspection or calculation 
of h,—és. Besides this, we require the first six moments of the independent 
variable « Of course if the regression of x on y be required, as well as that of 
y on «, the second correlation ratio and cubic product as well as the first six moments 
of y must be found. It is rare, however, that both regression curves are needed for 
a single enquiry. 

As to the general form of (lix.), we note that there will always be a real point of 


inflexion given by ' 
X,/oe=$ (dyps—E)/(dgho) - - » » + + » + (let), 
D 2 
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where 


bs= AV fh. (9? —71") — 2} /(bybs— $5"), 

and further that there may be two points of horizontality given by a certain quadratic. 
Thus, in general, the regression line will tend to be part of an S-shaped curve. The 
horizontal points may be imaginary, or, if real, either they or the point of inflexion 
may be far beyond the portion of the curve which crosses the observed field of 
frequency. If we consider, however, the slope of the regression curve to measure 
the regression in the neighbourhood of any point, we note that the regression is a 
maximum at the point given by (1xi.), and grows smaller and smaller towards the two 
points of horizontality, 2.e., points of complete local independence of the two 
characters. These are not unfamiliar features in certain practical cases of skew 
correlation,* and accordingly the cubic regression curve provides us with a ready 
means of describing regression phenomena, which cannot be dealt with by the simple 
line or the parabola. 

It may of course be suggested that a quartic or quintic curve would give a 
better result than a cubic. The answer to this is: Possibly, but the high moments 
and products required render it impossible to deal even superficially with the probable 
errors of the constants involved. The calculation of the probable error of 7 is a 
sufficiently stiff task in the general case. To test the probable error of a condition 
like (Lvii.), to say nothing of one like (lviii.), would involve an immense amount of 
work, since we should want the correlation of errors in y, é, ¢, and 6. Speaking with 
some experience of practical statistical possibilities, I think, the tendency to use very 
high moments or product-moments must be curtailed to the minimum of actual needs. 
We cannot deny the existence of skew variation, nor of the sensible curvature of 
regression lines. We must admit their existence as the result of statistical experience. 
This existence involves a great widening of the old frequency notions and the need 
for a new means of description. But we must remember that statistics are essentially 
a practical study, the art of describing by a few numerical constants observational 
experience, and we must curtail at every turn the desire to run riot in mathematical 
formulee, which cannot be generally applied in actual practice.t Still I propose later 
in this paper to deal with the general formule for quartic regression. 


(6.) Parabolic Regression. 


For a parabolic system }, must vanish, or nearly vanish. Hence we have from 


(liii.) and (lvii.). 
dsr ebs SOs cidwiandbers then ot (Lxii.), 
$4 (7 — 9) — 80 oe yee ee Le 


* Compare for example the regression line of age of mean age of bridegroom for actual age of bride, 
which gives a typical S-shaped curve. See ‘ Biometrika,’ vol. IL, p. 20. 
+ These remarks have special reference to the points dealt with on p. 6. 
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From these conditions we find 


=e/b.=+V (4? —1")/y. 


These give for the form of the parabolic regression curve 


gp {(Xelo.)"— VB (Kio.)—1}. . « (lxiv), 


Y,,[0y=” (Xp/o2)+ ¢ 
2 


or 


Y,,/o,=r(Y, (Veo. A/T" X,/o2)?— VB, (Xp/o2)—1} . . — (Ixv.). 


The latter form, besides the correlation coefficient and correlation ratio, requires only 
a knowledge of the skew variation constants 8, and 8,, and is therefore very easy to 
determine. Except for very nearly linear regression, there can be no doubt as to the 
sign of \/7?—r®, as we can tell at once whether the parabola ought to be concave or 


convex to the z-axis. In other cases the sign of /n?—7r* must be taken to coincide 
with that of €, which must therefore be found. It will then be as easy to use (lxiv.) 
as (Ixv.), although probably » and 7 can be found with less error than é€. 

It is thus quite easy to allow for such curvature of the regression line as can be 
expressed by a parabola of the 2™ order of the type considered. 

We notice at once that the regression curve does not pass through the mean of the 
two characters. Or, an individual with the mean of one character will most probably 
not have the mean of a second character. This is a rather important result, which 
follows at once for nearly all types of skew correlation. 

It will be seen, for example, that QuETELET’s ‘‘ mean man,” defended by Professor 
EpGEWoRTH as theoretically justifiable, depends entirely on human characters giving 
linear regression curves. Such linear curves are certainly given by many pairs of 
characters, ¢.g., cranial and body measurements, but there are certainly other 
characters for which regression ceases to be sensibly linear, and the conception of the 
“mean man” in this case fails. For example, if age be considered as a character, 
then the regression is certainly not linear, and the individual of mean age will not 
necessarily have either the mean physical or psychical characters. This seems of 
some importance for the general conception of “ type,” if by type we denote the mean, 
for probably there are other characters than age for which regression is skew. 

The regression, #.e., dY,,/dX, will be zero, for a point Xy max, for which 


Aamo] Bi nf Bb end BE i Xvi 


Oz 


the sign of the root being determined as before. Clearly, therefore, unless 7 be very 
small, or 7? diverges very sensibly from 7°, this point of zero regression may correspond 
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to a very large abscissa, and in some cases will lie entirely outside the range of 
observable frequency. 

The parabola of regression cuts the line of regression, z.e., the line of best fit to 
the series of regression points, or to the means of the a-arrays, in two points 
determined by the quadratic equation . 


ea tg 


or 


a SY BV BEEP es es xVIL), 

These points are always real, and correspond, if regression be truly parabolic, to 
the same values of the x-character, whatever be the y-character of which we are 
considering the correlation. In the case of normal variation of the «-character 
only, these are the points of inflexion of the «-distribution. 


(7.) Linear Regression. 


In this case it is necessary that both b, and b, vanish within the limits of random 
sampling, and, although these are not theoretically sufficient—for a whole series of 
relations between the higher product-moments could be written down*—they are for 
practical purposes sufficient. 

Hence we have the following conditions for linear regression :— 


gpa 4) idl. olin sO) near ott Teal), 
or, the coefficient of correlation, without regard to sign, should be equal to the 
correlation ratio. Further € should be zero, or 


Dy) DB Pp OO RR Ne eee 


The theory of linear regression is so familiar that it need not be further discussed 
here. In the actual practice of statistics, the determination of the means of the 
x-arrays and the drawing of the regression line will often suffice to show the fairly 
trained eye whether the deviations from it are random or not. If they are not 
random, then we must proceed to the determination of y and of the higher product- 
moments. 

The following are numerical examples of skew correlation, selected to illustrate the 
theory developed above. 


* For example, it is necessary in most cases that ¢ should vanish. In the instance of that very special 


case of linear regression, the Gauss-Laplacian normal frequency, it is easy to show that the constants ¢, ¢ 
both vanish as well as 7? =7°. 


SKEW CORRELATION AND NON-LINEAR REGRESSION. 31 


STATISTICAL ILLUSTRATIONS. 


(8.) Ldlustration A.—On the Skew Correlation between Number of Branches to the 
Whorl and Position of the Whorl on the Spray in the case of Asperula odorata. 


In this case the material was collected in a lane near Horsham, Sussex, at 
Whitsuntide, 1903, by Miss M. Raprorp. There were 150 independent sprays, the 
woodruff had just flowered, and the whorls were counted from the flower downwards. 
Being early in the season, the maximum number of whorls was five, and, in some 
cases, not even as many were available. The material was counted and tabled by 
the author, and the results are exhibited in the table below :— 


TABLE I.—Correlation of Whorl-Branches and Position ot Whorl. 


| 


Number of branches in whorl. 


x. Whorl. 2 Np Yarp: Try: Mg. Ms. 
4 5 6. t 8 

“ % | First . a 3 66 42 39 150 | 6°7800 | -8553 | -7316 | -1535 
Si | 2 | Second = 3 61 47 39 150 | 6°8133 | -8437 | -7117 | -0985 
5 S$ | % | Third. = eG 60 40 44 150 | 6°8133 | -9047 | -8185 | :0383 
‘2 = | a, | Fourth 1 12 68 39 22 142 | 6°4859 | -8780 | °-7709 | -1347 
Ay as *| Fifth . 1 13 53 10 10 87 6°1724 | -8605 | -7404 | -4049 

Totals. 2 37 | 308 | 178 154 | 679 | 6:6554 = = “22 


' We require the regression curve giving the probable number of branches for a 
given whorl. 


Dealing first with the skew variation in position, a purely arbitrary system 
depending solely on the number of whorls dealt with in each position, we find, not 
using SHEPPARD'S correction,* 


Mean=2'802,651, vy=1°787,268, yes 2'°799,638, 
o,= 1°336,887, Pa" oll ,7 Oo, Vg= 22°678,308. 


y,=5°841,682. 
Hence we determine 


Bo 017,027, po= 81 1,740, 
(P,=1'828,767, ds= ‘286,465. 
 Bg= 085,545, d,= °610,879, 


B,=3'972,295, and /B,=+°130,487. 


* The numbers are tabulated to six places, because we cannot be sure that the final calculations are for 
the data true to two places, which is all we finally retain unless this is done. Any number of figures can 
really be retained with perfect ease when the work is done on a calculator. 
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We now turn to the skew variation in the number of branches to the whorl, and 
get the following constants :— 


Mean=6'655,375, fo °806,124, 
a y= °897,842, po= °132,090, 
fy=1°138,410. 


The values of y,,, m, and ms, are given in table above. Using them we find 


Gye POET, ='249,911, 0, =0y/1—7? = 869,355, 
hy=ou?='050,345, y= "007,474, X1 = 990,862, yo=— 059,851. 


These give by (xxxili.), showing the numerical contribution of each term, 
3 2= 5 {878,991 —-010,323 — 000,888 — 007,231 + 013,578}, 


or the probable error of n = ‘0242. 

Had we calculated the probable error of y from (xxxiv.), we should have found for 
its value ‘0243. It is clear that for this special case the simple formula (xxxiv.) is 
amply sufficient, the small terms almost cancelling. 

We see that x, is almost unity, and the graph of o,,,/a, shows indeed that the system 
is sensibly homoscedastic. yx, is small, but a glance at the graph of the clitic curve 
on Diagram I. shows that we can hardly treat the system as homoclitic, the changes 
in the skewness forming a fairly uniform curve.* 

For practical purposes, we may treat the variability of the number of branches in 
any array as sufficiently closely given by o, /1—7. 

We now turn to the product-momentsf and find 


Pu =— 249,160, P3+=— 896,415, 
Py = — 286,289, Py = —1°210,225, 


* Throughout these illustrations the clitic curve is plotted by calculating the skewness of the arrays 
from }ms/(m2)*”. See p. 23. 

t In calculating these products referred to the centroid from those referred to any axes, generally 
corresponding to whole numbers in the table, the following reduction formule will be found useful 
We take NIIgy=S (nzy #4”), x’ and y' being measured from any axes, further, @’, 7’ are the distances of the 
means from these axes, and vo, v3, v4 the moments of the z-character about its mean as tabled above. 


PuH=Wy-@Ig, poy = Ty - 2@’TQ) + 2701 — 9'v2, 
par = Ig) — 34’ I, + 327T, — & 11) — 7'v3, 
pai Ty, ax 4@’TI3) + 622115, + 4@'31T), + #4119, — W'V4. 


The p’s should be further corrected for grouping by SHEPPARD’S corrections (given on my p. 36), provided 
there be high contact at the contour of the surface of frequency. SHEPPARD’S corrections have not in this 
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These lead to 
r=—'207,579, €=—'120,164, C=—-038,241, P= —'285,890. 


Thus all the constants are determined. 


We find 
ne? —7?='019,367, 


by (n?—7°) —2= 001,281, 
be (9°17) — 2 — (Lh. — hs)"/(hab4— 3°) = "000,276. 


These should be respectively zero for linear, parabolic, and cubical regressions. It 
will be seen that they are satisfied with increasing closeness; we might well be 
satisfied even with the parabolic regression curve. The following are the regres- 
sion curves determined, y,, being the actual number of branches in the whorl 
(=6°655,37 5+Y.), and x, the actual position of the whorl :— 

(a.) Straight line: 

Yz,=7'°046,087 —'139,408 z,. 

(b.) Parabola from (Ixv.) : 


Yx,=6'794,052 — "125,872 x, — "077,592 «2 ; 
or, 
Yx,= 6'853,561 — "077,592 (a,—1°991,535)”. 
This clearly gives a maximum number of branches, 6°8536 corresponding to 
#,=1°9915, a value within the limits of observation. 
(c.) Cubic from (lix.) : 
Yx,=6'799,399 — 192,439 X, —-084,230 X,?+-020,915 X,°. 


Here X, is measured from the mean position=x,—2°802,651, and y,, is, as before, 
the total number of branches for the given position. 

Condition (lvii.) is so closely satisfied that we shall here get sensibly as good 
results from (lix.) as from (lvi.). 

In the table below and in the curves of Diagram I. the values of the mean of 
the arrays, as found from line, parabola, and cubic, are given and compared with 


observation. 


case been used, as this condition is not fulfilled. The axes 2’, y’ actually taken for woodruff were those 


through the third whorl and through six branches. 
An obvious warning about the signs of the sums of the products may be given which may save 
computators some trouble. The axes being taken positive, as in the accompanying 


figure, then the sums of the products for Nj; and Ig; are positive in the 1* and ~ Fe ie 

3", negative in the 2™ and 4 quadrants. For Ily, and Hy they are positive 

in the 1* and 4" quadrants and negative in the 2™ and 3" quadrants. In ie, gna 1st Ey 
the figure the axes are taken so as to suit the « and y-directions of the table on ie 


p. 31. Care must, of course, be paid to this point. The products may also 
he found from the y,,’s in the manner indicated on p. 35, footnote. They were thus verified in this case. 


E 
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TABLE I[.—Mean Branches to each Whorl. 


me 0. i: 2, 3. 4. b. 6. 
yx,fromline . . . .| [7-046] | 6-907 | 6-767 | 6-628 | 6-488 | 6-349 | 6-210 
Yx, » parabola . 6-546] | 6-777 | 6854 | 6:775 | 6-541 | 6-151 | 5-607 
Ye, » cubic . 6-117] | 6-750 | 6-889 | 6-758 | 6-443 | 6-192 | 6-007 

Observed . Le 6-780 | 6-813 | 6-813 | 6-486 | 6-172 ’ 


I think we may safely say that in the relationship of branches to position of the 
whorl in woodruff we have a case of homoscedastic correlation, which is effectively 
described by a parabolic regression curve. Thus, in a case of this kind, it is only 
needful, besides the moments up to the fourth of the x-character, to find the 
correlation coefficient 7 and the correlation ratio 7. 


(9.) Illustration B.—On the Correlation between Age and Head Height in Girls. 


The data for this are taken from my School Measurement series, and involve the 
auricular heights of 2272 girls between the ages of 3 and 22. ‘There was considerable 
paucity of material at the extreme ends of the range, and accordingly as our correlation 
curves are all obtained by weighting the observations, we can hardly expect good fits 
near 3 or 22 years of age. The actual correlation table is given as Table III. 
SHEPPARD'S corrections were applied throughout, and the unit of height is 2 millims. 

In the first place the means, standard deviations, and 3" moments of all the arrays 
of heights for different years of age were determined, These are given at the foot of 
Table III, but in actually calculating the constants more places of decimals were 
used. Then the first six moments of the frequency of the ages were found and the 
first four moments of the height frequencies. These are the x and y-frequencies. 
They give us :— 


TaBLe I1].—Correlation between Age and Auricular Height of Head in Girls. 


Age. 
3-4, 4-5, 5-6. 6-7. 7-8. 8-9. 9-10. 10-11. 11-12. —-12-18. 13-14. 14-15. 15-16. 
millims. 
102 *25-104 ‘25 — 1 1 — res ee pes a os, : = ao poi 
104 *25-106 °25 ~ — — 2 = 1 1 1 2 1 2 _ 
106 *25-108 +25 _ — 1 = 1 = 1 iss 4 i 2 eee 
108 ‘25-110 +25 — -- _ il 5 2 1 4 2 2 4 1 
110 *25-112 +25 — — 1 3 1 5 12 3 6 5 3 9 
112 25-114 -25 — — 1 —_ 4 3 10 8 6 9 4 3 
114 :25-116 ‘25 1 — 3 4 7 8 15 14 ll 16 10 7 
116 -25-118 -25 — 2 2 9 9 7 10 23 15 18 13 9 
118 -25-120 25 _ 2 2 4 13 22 24 25 37 A 23 ll 
: 120 -25-122 ‘25 — 2 3 6 9 19 25 29 34 41 32 21 
| 3 122 -25-124 25 oa — 8 3 7 17 23 34 38 33 21 22 
: 124 *25-126 -25 _ — — 1 6 19 18 33 29 40 32 23 
oO 
ic 126 25-128 *25 — — 1 6 9 10 8 21 27 27 32 20 
128 *25-180 -25 — — _ — _ 6 9 17 16 20 39 25 
130 *25-132 25 — — — 1 3 6 5 7 13 17 17 15 
132 -25-184 -25 —- a — _ 1 — i 8 10 13 8 5 
134 *25-136 *25 -- -- — _ 1 il 3 4 4 9 11 13 
136 *25-138 *25 — — _ _ — — 3 2 2 10 4 5 
138 *25-140 +25 — _ — — _ —_ — 2 3 3 2 opt 
140 °25-142 -25 — _— — — _ — 1 — 2 1 2 4 
142 “25-144 °25 — = = = aa — 1 — — _ 2 3 
144 °25-146 °25 _ = ae = Se = a a _ = — _ 
146 *25-148 *25 _ — a = — = — — _ — — — 
DOG Euim sf at <% 1 7 18 40 76 125 177 235 261 309 263 198 
ra i 1152500 | 116°9643 | 117°4722 | 119°1000 | 120°3026 | 121°6340 | 121-7246 | 122°8160 | 123°1427 | 123-8908 | 124°8622 | 125-7146 
1-millim. units 
Standard deviation 
a 0 2 8853 2 ‘9276 2 9641 2 9882 2 6366 3 3877 29653 32089 | | 3:2061 3 °3589 35865 
2-millim. units 
f 4 | roa era aaa —> Sap (RG Sarr aay age pe = 1] 
ee ae” } 0 — 42-992 |~ 18-108 |— 7-679 |+ 1°782 |- 6171 |+ 15-993 |+ 2330 |+ .0-238 |+ 8-219 |— 7-286 |+ 3-015 
2-millim, units | 
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Height Constants. Age Constants. 
Mean height = 124'0467 millims. Mean age = 12°7007 
o,= 3:064,819 ) 
oy= 3454125) vg eM 9°393,110)| 
Jio= —-:11°930,977 ee v= —-1°051,882 i 
2 millim. year 
g=  —«57206,247 Ponte vy= 239°157,055 { * 
p= 438°639,638 | vs= 104:298,702 
Yg=9536'265,059 | 
B= 015,960, B= 001,335, 
B= 3081454, Bo= —-2°710,593, 
Bs= 014,093, 
Further Au=, 117506681, 
Su=  —«2093,366 millims. /Bi=+ 036,538, 
A=  4°882,181 | in 1 millim. do=  1°709,258, 
i= heerend units. Dee ‘250,123. 
Hence 
(Ay— 3A9”)/(4A9") = 062,340, d= 4°158,032. 


In the next place the products were worked out and referred to the means with 
the following results :—* 


Py= oO 1L13.7,12. whence r= _ ‘294,128, 
Pyu=—  1°957,022, é= — 071,065, 
py  74°447,616, [— —-048,576, 
P= —108°701,559, A= — 470,126. 


Further, from Sy, 1='303,024. 
In deducing the product-moments after they had been referred to the means, the 


* These products were in this case (as in all other cases) verified by calculating from the means of the 
arrays Yz,, the expressions 


Nap) » (%, — &) Ne Y,; » (2 =2)\ | ote (a ey { Bertin Sp 2) 
pf Salat} 8 { Stole as Usa fen Salalah a e . 


Of course it is easiest to calculate these products about some arbitrary origin coinciding with the 
abscissa of one array. If these products be then p'y, p'x, p's1, p's, and # be the mean, we have 


Pu=P'u; 

Pa =P'n — 22'p'n, 

Psi =P's1 — 3&'p'2 + 3p 'y,, 

Pu=Pp' — 48'p's1 + 6@2p'n — 40% p'y,... . 
E 2 


36 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 


proper SHEPPARD'S corrections were introduced. These are, if {p.,$, {pots {psit, 
{ 4+ represent the uncorrected moments :--- 


Pu=tPu3, Pu={Pas, 
Pu ={Pai}—4t Pus, Pa= {Pus 31 Pa}, 
the units of grouping being the units throughout. 
From the constants for the arrays, I found 


Xi —-1=—000,675, y= — "007,198. 
Whence the probable error of 7 was determined by (xxxiii.). Its value was* 
Probable error of »=°012,913. 


If found from the simple formula 67449 (1—7*)/N, the value is ‘012,851. We 
accordingly are again forced to the conclusion that » may for practical purposes be 
found from this simple formula, instead of the complicated result (xxxii.). Although 
both y,;—1 and y, are small, it is very doubtful whether we can legitimately consider 
the system as homoscedastic. The dotted line ab of Diagram II. would fairly well 
represent increasing variability with age. The skewness of the arrays is relatively 
small and changes sign so frequently, that we can certainly not attribute any law to 
such heteroclitic tendencies as there are. They are probably due to errors of random 
sampling from truly isocurtic material. 

It will be seen that the height frequencies with f’;='0160 and B’,=3'0815 do not 
differ very much from a normal distribution ; in fact, we can lay no stress on the 
heteroclisy of the system at all. But the values of the standard deviations of the 
arrays, or the graph of o,,/0,, certainly shows increasing variation with increasing age, 
a phenomenon with which one is familiar in a variety of other human characters. t 

This heteroscedasticity, due to increasing variation with growth, would hardly have 
been anticipated from a mere inspection of the smallness of y,; it is somewhat 
obscured by the irregular values of the standard deviations of the small arrays at 
the adult end of the age range. The mean value of the standard deviation of the 
weighted arrays is 0, \/1—y?=3'2992 in 2-millim. units. 

We now turn to the regression curves to see how far the conditions for the 
different types are satisfied. We have 

n°? —7?= 005,312, 
by (72-72) —2= 004,030, 


ts (7° —r°)—B— (Gy —Eds)"/(bs$4— $32) = 000,604. 
* The contributions of the successive terms of (xxxiii.) are in fact given by 


ae = {824,785 + 001,870 + 004,673 — -000,472 + 001,888}. 


t+ See PeEARsoN: ‘The Chances of Death and other Studies of Evolution,’ vol. L, pp. 296, 307, 
310, 314. 
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But the first should be zero, if the regression be linear; the second, if it be 
parabolic ; and the third, if it be cubical. 

We see increasing approximation to fulfilment of the several conditions. Referred 
to axes through the mean age and head height, the following are the regression 
curves* :— 

(a.) Straight line: 

Y,,= "662,979 X,, 


(b.) Parabola (from equation (Ixv.)) : 
Y,,='055,749-+ 667,570 X,— ‘041,001 Di Pays 
(c.) Cubic (from equation (lvi.)) : 
Y,,='280,194-+-'722,886 X,— ‘029,580 X,?— 002,223 X,%. 
(c’.) Cubic (from equation (lix.)): 
Y,,='296,076-+ 812,249 X,,— 028,004 X,°— ‘005,740 X,?. 


(c’) will not give as good results as (c), for it depends on a use of the condition 
(lvii.) which is not absolutely fulfilled. 
The following table gives the values in the case of the four curves :— 


TasLeE [V.—y,=Mean Auricular Height of Girl's Head at Given Age. 


Up = age. Regression line. peepee Cubic (c). Cubic (¢’). Observed. 
3°5 117-95 114°49 116:°90 118-94 115-25 
4°5 118-61 115°87 117-66 118°94 116°96 
5:5 119.27 117°17 118°42 119-16 117°47 
6°5 119-94 118-39 119°24 119°57 119°10 
7°5 120°60 119°52 120°08 120°14 120°30 
8:5 121-26 120°57 120°93 120°84 121°63 
9°5 121-92 121°55 121-78 121°62 121-72 

10°5 122°59 122-43 122-62 122°45 122°82 
11°5 123°25 123°24 123°42 123°26 123°14 
12°5 123°91 123-97 124°18 124°15 123°89 
13°5 124°58 124°61 124°88 124-95 124°86 
14°5 125°24 125°17 125°52 125°65 125°71 
15°5 125°90 125°65 126-07 126°22 126°16 
16°5 126°57 126°05 126°52 126°68 126°53 
17°5 127-23 126°36 126°87 126°93 126°91 
18°5 127°89 126°59 127-09 126°96 127-02 
19°5 128°55 126°75 127°18 126°74 129°56 
20°5 129-22 126°81 127°11 126°22 123°82 
21°5 129°88 126°80 126°88 125°38 126-50 
22°5 130°54 126°71 126°48 124:28 125°25 


* Y,, is here measured in millimetres and X, in years. 
+ The maximum ordinate is at vertex of parabola, i.¢., «=8'1409, or age 20°84; its magnitude = 126-82. 
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An examination of this table and the graphs on Diagram II. seem to show :— 


(i.) That cubic (c) is considerably better than cubic (c’). 

(i.) That we do get a sensible betterment in passing from parabola to cubic, and, 
accordingly, that we must use in this the cubic to effectively describe the regression 
within the range of observation. Probably neither cubic nor parabola would effectively 
serve for extrapolation even close to the limits of observation. 


Thus the cubic (c’) starting at 3-4 with its point of inflection is clearly 
inadmissible, and the drop after 20 or 21 years of age, shown by both parabola and 
cubic, is, of course, only due to the anomalous character of the few girls over 18 left 
in the schools. Actually the shrinkage of measurements does not begin till at least 
26 years, and is then far more gradual than these curves indicate. 

But, as in all fitting of this kind, we obtain the best fit we can within the range, 
entirely at the expense of what may occur just outside the range. For this reason, 
as EK. Perrin* has pointed out, a good interpolation curve is usually a bad extra- 
polation curve. 

We might sum up our results for auricular height with age in girls by saying: 
That the correlation is non-linear, effectively cubic; heteroscedastic, there being 
increasing variability with growth; that while the total height frequency is not very 
far from normal the array frequencies are slightly heteroclitic, but so very irregular in 
sign, that probably we are dealing with a case of isocurtic homoclisy, to which the 
sparsity of data in the extreme arrays gives an appearance of anomic heteroclisy. 


(10.) Illustration C.—On the Skew Correlation between Size of Cell and Size of Body 
im Daphnia magna. 


Dr. E. Warren has dealt with this point in a memoir published in ‘ Biometrika,’ 
vol. IL, pp. 255-9. The resulting regression curve of size of cell for given size of 
body is very far from linear, and it is quite clear that the correlation is skew. It 
has already been noted in ‘ Biometrika’ that the relationship is considerably obscured 
by the irregularities produced by ecdysis. Our object at present, however, is purely 
theoretical, namely, to show how a certain system of constants and of curves describes 
the actual correlationship, and for this purpose Dr. WaRreEnN’s observations form as 
good material for graduation as we could expect to find. The following Table V. 
gives the observations with the working scales attached. I must refer to 
Dr. WARREN’S paper (p. 256) for the relation between the units of grouping on the 
working scales and those of the actual measurements on body and cell lengths. As 
far as correcting the raw moments is concerned, SHEPPARD'S corrections were used 
for the cell sizes, but not for the body lengths, because the number of individuals in 
the latter case was perfectly arbitrary and there is no approach to high contact. The 


* Biometrika,’ vol. III., p. 99. 


SKEW CORRELATION AND NON-LINEAR REGRESSION. 39 


product moments were also uncorrected. The product moments were found in both 
ways (see p. 35, footnote) and the results thus verified. 

Table V. gives the means, standard deviations, and third moments of the arrays ; 
the latter are all small and superficially irregular in sign. I think we may say that 
there is no marked and continuous heteroclisy. On the other hand, I think we may 
say that while the clitic curve deviates to and fro from a zero base, the scedastic 
curve would fit better to a parabolic curve than to the straight line which is its 
mean. In other words, the variability of the cells increases with size of body (2.e., 
growth) up to a certain stage and then decreases again. This result is obscured by 
the fall of the variability after each ecdysis. Roughly the ecdyses produce a rhythm 
in all three curves, the regression curve, the scedastic curve, and the clitic curve. 
When the means of the arrays are above the regression cubic, then the ordinates of 
the scedastic curve are above their mean and those of the clitic curve show positive 
skewness ; when they are below the regression curve, we have lessened variability 
and negative skewness. In other words, the ecdyses are accompanied by lessened 
cell variability and negative skewness of distribution. I think we may state that 
there is a nomic heteroscedasticity due to growth of body, giving first an increased 
variability with growth and afterwards a decrease with age. There is probably 
isocurtic homoclisy. Both of these are, however, obscured by a semi-rhythmic 
heteroscedasticity and heteroclisy introduced by the ecdyses. 

We now turn to the constants of the cell and body length distributions, merely 
noting that all these constants are given in terms of the units of the working scales. 


Cell Constants. Body Length Constants, 
Mean cell= 9°268,657, Mean body length= 8°502,488, 
oy= 2°541,734, T= 3°864,784, 
jo= 6°460,410, y= 14:936,562, 
y= 2°142,362, ve=  — 5°125,806, 
j= 123°921,496, y=  —-432°769,533, 


pe= — 425°276,682, 
ye= 151925875, 


B/= 017,021, A= ‘007,885, 

B/= 2°969,111. B= 1:939,793, 

Further Bac 043,796, 
B= 4°559,091, 

Sus=  1°454,600, /Bi=  — 088,798, 

ho= 2°15, 862, b,= ‘931,908, 

hy= 15°142,840. ds=  — ‘232,167, 


Hence (Ay—3A,)/(4Aq")= 095,615. or ‘788,409, 
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We have next the product moments referred to the means 


oe i 3°892,863, whence r= ‘394,862, 
Py=— 12°104,322, e= — ‘281,831, 
Psi=  127°348,064, C= 098,578, 
Py= —541°433,455, J= — 759,344, 


Further, from >, 
9 ='572,287. 


From the constants for the arrays I deduced 
Xi —1=— 108,148, Xo= 088,323. 


These are higher values of y,—1 and y, than we have found in the first two 
illustrations. 
We now obtain, showing the contribution of each term of (xxxiii.), 


321 5-459 240—~-002,528-+ 010,803 — 013,180 — 027,875}. 
a 


Whence probable error of »=°67449 %,='0097. 

Had we calculated the probable error of » from (xxxiv.), we should have found it 
equal to ‘0101. The difference is greater than in the two previous illustrations, but 
is only ‘0004, and this would have no significance in any practical use of the probable 
error. We again conclude, therefore, that (xxxiv.) is sufficiently close to replace 
(xxxill.) in practice. 

For the mean standard deviation of the weighted arrays we have 

O, =0,V/ 1—?=2'084,358. 
If we now examine the criteria for the nature of the regression, we have 
1? —7?="171,596, 
by (n?— 12) —2= 080,483, 
by (1° — 1?) —& — (Cho—&hs)"/(bobs— 5° )= 079,457. 

We should conclude, therefore, that linear regression is inadmissible, but that 
parabolic or cubic will be moderately successful, the latter not very much better than 
the former. Our moderate success only in this case is, of course, due to the irregu- 
larity of the results to be graduated, the influence of the ecdyses being so disturbing 


that we really need a curve periodically varying from the graduated regression curve. 
We have the following. regression curves :— 


(a.) Straight line: 
_._ ¥,,='259,687 X,. 


F 
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(b.) Parabola from (Ixv.) : 
Y,, =1-097,690+ 236,135 X,—'073,490 X,”. 


The maximum occurs when X,=1°6066, and is given by Y,,=1°2874, thus occurring 
within the limits of observation.* 


(c.) Cubie from (lix.) : 
Y,,,='752,856-+°193,058 X,—°049,817 X,?-+ 001,710 X,°. 


In all these cases Y,, and X, are measured from the means of the cell and body 
lengths, or from 9°268, 657 and 8°502,488 respectively. 

Table VI. gives the calculated and observed results, and the whole system is 
represented in Diagram III. Either the parabola or cubic graduates quite well the 
results, allowing for the periodic deviation, and we may fairly describe the system as 
a heteroscedastic cubic regression with isocurtic homoclisy. The correlation ratio is 
very sensibly different from the correlation coefficient. The regression cubic does not 
differ widely from that given in ‘ Biometrika,’ which was obtained without weighting 
the means of the arrays, and by simply striking the best cubic of the given type 
through the points. 


TaBLE VI.—y, = Mean Cell Length for Given Body Length in Daphma. 


%p=hbody length.| Regression line. | Regression parabola. | Regression cubic. Observed. 
1 7 +320 4°458 5°047 5*300 
2 7°580 5°724 6-190 5°833 
3 7°840 6° 842 7:166 7°790 
4 8-099 7°813 7°986 8-050 
5 8° 359 8-638 8°661 9-473 
6 8-619 9°315 9°200 8° 436 
7 8°879 9°846 9°613 8-596 
8 9-138 10°229 9-912 10°267 
9 9°398 10°466 10-105 10°761 

10 9-658 10°555 10-205 11-027 
11 9°917 10°498 10-220 10-953 
12 10°177 10-293 10°161 9-100 
13 10°437 9-942 10-038 9-000 
14 10-696 9°443 9°861 10°036 
15 10°956 8-798 9° 642 10°317 


(11.) Jdlustration D.—On the Skew Correlation between Number of Branches to the 
Whorl and Position of the Whorl on the Stem in Equisetum arvense. 


I have selected this example not on account of any biological importance, because 
the material is—especially with regard to the first and last two whorls—unsatisfactory 
either on account of irregularity or of insufficiency of material. It has been taken 


* Actual values on working scales, z, =10°1091 and yz, = 10-5560. 
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purely from its statistical interest, because it gives a series with markedly skew 
correlation, having a regression curve of a rough S-shaped character. If we omit 
the first and last whorls, we get, as I have already shown,* a remarkably close fit 
with a cubical regression curve. My present object, however, is not to consider any 
law of growth, but merely a mass of statistical material, to be dealt with by the 
processes of the present paper. 

We may anticipate that the irregularities of the series, indicated in the memoir 
just referred to, will make themselves manifest in a less satisfactory fitting of the 
regression curve than occurs when we deal with the more homogeneous group ot 
equally weighted whorls fitted in the diagram of that paper. Table VII. gives the 
data, with the means, standard deviations, and third moments of each array. 

The axis of x shall be taken to give the position of the whorl on the stem and that 
of y to denote the number of branches. We require the regression curve of y on @, 
or the probable number of branches on a whorl in a given position. We shall not 
use SHEPPARD’s corrections for the moments of either the w or y-characters, as high 
contact certainly does not hold for both at the low-value ends of their ranges. 

We have the following constants :— 


Position Constants. Branch Constants. 
Mean position = 6°403,315, Mean number of branches = 7°216,851, 
o,= 35 42,604, o,= — 3°278,499, 
y=  12°550,046, fyo= —«10°7 48,557, 
ve= —»-8 249,534, fg=— 24°313,478, 
y= 319°515,824, p= 245°811,660, 


vz= 644:095,176, 
yg=11203'5814, 


B= ‘034,429, A= ‘476,044, 
s= —-2028,625, Blo= —-2°127,658. 
Bo "214,190, Further 
B=  —-5°667,884, Syw= «2°78 9,949, 
/ B= "185,550, =  7°783,815, 
oo= 994,196, hy= 140°441,685. 
o3= "592,384, Hence 
d,= —«i1°5 18,136. (Xy—BAg2)/(42.2)= = "170,508. 


We have next the product moments referred to the means 


* «Proc. Roy. Soc.,’ vol. 71, p. 308. 
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Py= — 8°225,585, whence r= —‘708,222, 
Pas — 21°471,821, é= —-390,436, 
Ps\= —205-084,042, T= +°029,733, 
Py= —917°984,938, d= —-960,212. 


Further, from >y, 
1='850,984. 


From the constants for the arrays we deduce 
Xi —1=—"356,367, x= — "312,952. 


We now obtain, showing the contribution of each term of (xxxiii 7: 
+= i {-076,080—"157,932-+ 055,359-+079,662-+ 038,579}. 


Whence probable error of n=°67449 2,=°0054. 

Had we calculated the probable error ot » from (xxxiv.) we should have found it 
equal to ‘0049. The difference °0005 is not of importance for practical purposes. 
Yet in this case it is clear that the values of y,—1 and x, are very sensible. Thus we 
see that a very marked heteroscedastic and heteroclitic system with continuously 
changing standard deviation and skewness scarcely affects for practical purposes 
(z.e., to three significant figures) the probable error of 7. All four of our illustrations 
therefore confirm the conclusion that : 

For practical purposes the probable error of the correlation ratio, n, may be taken 
as ‘67449 (1—7’*)/N. 

Our Diagram IV. gives the values of the relative standard deviations of the arrays, 
or, o,,/0,, the horizontal line giving \/1—n?= "5252, or the mean value of the relative 
standard deviations of the weighted arrays. We have also the clitic curve giving 
1/8, for each array.* The remarkable smoothness of these scedastic and clitic curves 
in this case indicates how far certain types of correlation surfaces diverge from pure 
normality of distribution, the divergence being obviously nomic. 

We now turn to the regression curves and write down the conditions for the 
different types; the three expressions should be zero for linear, parabolic, and 
cubical regression respectively 

ne? —7? = 222,596, 
hy (yn? —7”) —& = 068,864, 


by (4° —7°) —& — (Lh, — ehs)"/(hop4— $s) = 010,127. 

a A) By = difference between mode and mean divided by standard deviation = skewness in the case of 
skew-curves of Type III. (‘ Phil. Trans.,’ A, vol. 186, p. 373), and may be taken as a reasonable measure of 
the skewness for those cases in which the fuller form involving B2 would involve too laborions calculations. 
If in equation (xii.) of the present memoir we put B.=3+ a small quantity, and remember that ; is itself 
a small quantity, we see that the more correct formula for the skewness involving f2 reduces, neglecting 
terms of 2" order, to 4 /B. 
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We see at once that the straight line is inadmissible, the parabola will not be very 
good, and the cubic only moderately appropriate. The conditions are not nearly so 
closely fulfilled as in the cases of woodruff and head heights; the last two are better 
than in the case of Daphnia cells, but while the deviations in the case of Daphnia 
were irregular, there being no approximate smoothness in the scedastic or clitic 
curves, we shall find here more uniform deviations which would probably be partially 
allowed for by a quartic regression curve. 

The following are the regression curves :— 

(a.) Straight line: 

Y,,=— 655,423 X,, 


(b.) Parabola from (lxv.) : 
Y,,=1°551,307 —°574,171 X,—°123,610 3. 


The maximum ordinate is at the position X,==—2°3225, or #,=4:'0808, with 
maximum number of branches y,=9°435. 


(c.) Cubic from (1vi.) : 
Y,,=1°590,413 —'987,694 X,—'1 37,641 X,”-+ 016,605 Gi. 


In all cases X, and Y,, are measured from the mean position and the mean number 
of branches, 2.¢., 6°403,315 and 7°216,851 respectively. 
The following table contains the calculated and observed results :— 


TaBiEe VIII.—Mean Number of Branches to each Whorl in Hqusetum. 


at nestan ling | Regression Regression Regression cubic 
Position. Regression line. | ls Ne sah Observed. ia st sae 

1 10°758 8: 262 7°506 7°619 [8-207] 

2 10°103 8-900 9°070 9°294 8:929 

3 9°447 9°291 9°920 9°627 9-869 

4 8°792 9°434 10°156 9°730 10°161 

5 8°137 9°330 9°876 9°643 9°911 

6 7°481 8: 980 9-182 9°427 9°224 

7 6° 826 8° 382 8:172 8°732 8+ 205 

8 6°170 7°536 6-947 7°297 6 +962 

9 5°515 6°444 5+ 605 5:555 5599 
10 4°859 5+104 4°247 3°964 4°223 
11 4°204 3°517 2°971 2°443 2°939 
12 3°549 1°683 1-879 1-866 1°854 
13 2°893 —0°399 1-069 1°462 1:072 
14 2°238 —2°727 0:641 1°333 0-700 
15 1582 — 5303 0°694 1-250 0°844 
16 0:927 -8°126 1°328 1-000 1°610 


In the last column I have placed the results of re-working the whole system, 
omitting the first whorl as largely influenced by the ground condition at the foot of 
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the stem.* The improvement of fit is not sufficiently great to justify a publication of 
all the constants for the distribution in this modified case. But there is improvement 
for the higher whorls, which are so few in number as to be wholly insignificant when 
compared with the weight of the first few low whorls. 

It will be noticed at once that the line and the parabola (which gives at the top of 
the stem negative numbers!) are absolutely unsuitable for representing the facts of 
the case. The cubic is better and certainly gives the general trend of the observa- 
tions, but in this our last illustration we have clearly reached the limit of material to 
which such cubical regression can be satisfactorily applied. See Diagram V. 


(12.) Quartic Regression. 


It seemed of some interest in this case of Hgwisetuwm to ascertain whether any real 
improvement in description would be reached by considering the quartic regression 
curve. I briefly indicate the theory in this case as developed from the general 
method in the footnote, p. 25. We shall now have 


Ve,/oy=by-+b; (Xp/or2)-+by (Xp/os)" 40s (Xp/oe)* +0, (X,/oe)' 


Eliminating b, and b,, by the processes familiar to us from the case of cubical 
regression, we have 


Y,,/0y=" (X,/o) +b{(Xp/o2)?— VB, (X,/o2)—13 
+b;{(X,/o2)’—By (X,/02)—V/ Bi} 
+b, {(X,/o2)'—(Bs/V'B:) (X,/o2)—Bo. « » . + (Lxx.). 
Hence as before 
E=lyb.+ bshs+ bib; 
(—tehctbaptlm, >. . . . . . « » (tex), 
O=byh;+bspot bid; 


where ¢,, ¢3, and ¢, are given as before by (li. and liv.), while 


ne he eee Ae, st | RIE); 
¢s=(8;—B2Bs—BoB,)//B, . - » . . + « (Ixxiii), 
oe ¢,=(BiBs—Bs’—B,By VB, - - . + + + + (ixxiv.), 
Beene eaves nt Votmee SP Mice.) 


Solving, we have 


b,= B (pos — $3')—€( ids — bss) —{( hohe — $35) re Savi), 
; pohshy eZ bibs” he dubs ay pope + 2dhshshe 


* «Roy. Soc. Proc.,’ vol. 71, pp. 308-310. 
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and 


Riad AU Rie Mad ve aalaclye ksi lane 


= lb, Lh.—€hs $obs— bss bss 
SS hich, ha i. 


Substituting in (lxx.), the solution is completed. The advantage of this form is that 
we see clearly the modifications made in b, and 6, as we pass from cubical to quartic 
regression. On the other hand, ¢, and ¢,, as shown by (lxxv.), involve the 7" and 
8" moments of the x-character. These are not only very laborious to calculate, but, 
as we have already shown, are as a rule very untrustworthy. 

If we proceed as on p. 26, equation (lvii.), we find 


nr—r=betb6+b,0 . . . . . . . (Ixxviii), 


bo= eb, — bbs — b, biPs— pips— Pas | 


Using this and not the third equation of (Ixxi.), we replace (Ixxvi.) by 
jr & _(S¢,.— 5)" | 


be = (bb, —$2)—__l "ba balGabi— os) 
i i 6 (h.b4— ds’) a E(hyh5— bsh5)— l( hobs ica bss) 


This equation for b, only involves the 7™ and not the 8" moment, but like the 
corresponding form (lx.) suffers from being a ratio of small quantities. (Ixxvii.) 
completes the solution as before. 

(Ixxvii.) and (lxxix.) in conjunction give us a necessary condition for quartic 
regression. We can indeed now write the whole series of conditions as follows :— 


(Ixxix.). 


Linear regression : 
"° —7r=0. 


Parabolic regression : 
otha de wee La 
7-7’ —e/d,.=0. 


Cubical regression : 


1 —&/b,—(Lb.—€bs)"/ 1 by($op1— bs) } = 
Quartic regression : 


7—r—e/b,— (<p — €h;)° {O(d, py— bs) —€ (bib; — b3b5) — C( hobs — bss) } = 
: > Com ps’ ) (be by 3” NM pobahy at drs ub pips rae hops =r 2hshshs) 


We now have a third possibility: we can get rid of the fourth product moment @ 


from the value of b, and write it: 


—1 —2/b.—(lb.— &ds P/ {bs (bob4— bs” ys ; 
y= Vv bps — bss dpoho— bs; oa - + (Ixxxi). 


Brie bibpabd an ee 


SKEW CORRELATION AND NON-LINEAR REGRESSION. 49 


While this value of b, does not suffer like (Ixxix.) from being the ratio of small 
quantities, and would a prior appear to save the calculation of 0, yet the right sign of 
the root may not be ovious on inspection, so that an actual determination of 6 to find 
the sign of b, may after all be needful. If (Ixxx.) were absolutely satisfied, (1xxxi.), 
(Ixxix.) and (Ixxvi.) would lead to identical results; but this will rarely be true in 
practice. In any of the three cases b, and b, will be given by (Ixxviii.). On the 
whole, I consider that (lxxxi.) and (Ixxvi.) will give the better results, and probably 
the former the best, but it will generally require as much arithmetic as the latter. 


(13). Illustration E.—Caleulation of the Quartic Regression Curve in the Case 
of Equisetum arvense. 


The only new constants required are : 
v7 = 43,207 °386, whence 8,=1'144,882, 
v,=507,649°540, B,=20°463,633, 
and : 


$;,=3°425,069, y= 3°452,046, 


,=15°015.792. 
These lead us to: 


$ibs— dsb = 2'723,384, P2bs—$a85 = 1°311,194, 


$2b4— $s 274 8 
Ay: bo, 5, 5] == 1°745,622. 
bs, Py be 
$s, Pe 7 


Our successive conditions are therefore : 
ni — 12 = 222,596, 
0? — 7° —&/h,= 069,266, 
1? —E/o— (Lh.— ep s)"/ | bs (Pohs — $s") } = "010,186, 
nr —r—2/h,—(lb.— bs)"/{ bs (Pops— $5") 3 


a {6 (b.b,— bs”) —E(b4b;— baby —o dob.—Pshs) }? ds 
(¢26;—$5") d, 007,200, 


whence we see the successive approximations to the fulfilment of the conditions. 
Clearly great gains arise when we pass from linear to parabolic, and from parabolic 
to cubic regression, but the advance is not so conspicuous when we pass to quartic 
regression. 
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We have :— 
From (Ixxvi.): 5,=°044,517, and b,.=—°648,122, b,=*171,260, 
From (Ixxix.): 5,='151,842, and b,=—°940,410, 6,=:041,981, 
From (lxxxi.): 6,=°025,999, and 6,.=—°597,691, b,=°193,688. 


_The equations to the three corresponding quartics are : 
(a). Y,,=1°724,611—913.208 X,—169,311 X,?+-012,629 X,8+-000,927 X,#, 
(b). Y,,=2°047,717—"734,966 X,—'245,667 X,2+-003,096 X,°+4-003,161 X,‘, 
(c). Y,,=1°668,788—°944,192 -X,—*156,137 X,24+°014,283 X,3+-000,541 X,4 
The values of Y,, and X, are as before measured from the means, or 7°216,85i. and 
6°403,315 respectively. | 


The values of the observed and calculated ordinates are given in Table IX., and 
the graph of the results in the lower half of Diagram V. 


TasLE [X.—Mean Number of Branches to Whorl in Hquisetum deduced from Quartic 


Regression. 

Position. Quartic (a). Quartic (0). Quartic (c). Observed. 
1 7-731 8: 269 7°637 7.619 
2 8-950 8°662 9-000 9°294 
3 9-715 9° 222: 9-800 9°627 
4 10°014 9°674 10°073 9-730 
5 9*858 9-816 9-866 9°643 
6 9°281 9-521 9-240 9°427 
7 8°339 8-740 8:270 8°732 
8 7°109 7°498 7°042 7°297 
9 5°692 5-898 5°656 5°555 

10 4-209 4-116 4°225 3° 964 
11 2°816 2°407 2°875 2°443 
12 1°651 1-100 “1°745 1:866 
13 0:930 0-600 0:°987 1°462 
14 0°857 1°389 0:766 1°333 
15 1°665 4°022 1°259 1:250 
16 3°609 9:133 2°657 1:000 


From these results we deduce the following conclusions :— 

(i.) That the use of a quartic instead of a cubic regression curve has not very 
markedly bettered the fit. The failure to get a closer fit lies largely in the nature of 
the material. The number of plants with more than 13 whorls is very few, and their 
contribution allows little weight to the tail of the regression curve. Further, all our 
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attempts to fit a smooth regression curve show that the observed data are unduly 
flattened at the top. | If we confine ourselves to a homogeneous series of 110 plants 
with ten whorls apiece, we get a remarkably good fit.* The S-shape of the 
regression line as indicated in both cubic and quartic does, however, appear to be 
characteristic of the nature of the plant, and I take it that more ample material 
would allow of a closer analytical description by a simple cubic. I doubt whether for 
practical statistics the use of the quartic will often be requisite. 

(ii.) The comparative failure of the quartic (b) shows us that a formula like (Ixxix.) 
is of small service. This corresponds fully to our experience in the use of (lx.) in the 
case of the cubic. In both cases we get rid of a high moment by making a certain 
constant the ratio of two small quantities, and experience shows us that the result is 
unsatisfactory. It is accordingly preferable to use formule involving high moments 
of one variable in preference to those with a ratio of small quantities. 

(ii.) The quartic (c) appears as good, if not slightly better, than quartic (a). In 
(c) we have got rid of a high product moment, 8, by supposing the quartic condition 
(Ixxx.) rigidly fulfilled. This of course is not the case. It is clear that product 
moments like @ of the 5™ order are far from advantageous, and this is the same principle 
which was in evidence when we found (lxv.) giving better results than (Ixiv.) for 
parabolic regression. Hence we must further conclude that the use of third, fourth or 
fifth product moments is disadvantageous as compared respectively with fifth to eighth 
moments of one variable. Or, a moment two degrees higher is preferable to a product 
moment in calculating correlation values. This is, I think, consonant with our 
knowledge of the relative magnitude of the probable errors in the two cases. 


(14.) General Conclusions. 


(i.) The present paper provides us with a general method of dealing with the 
regression line and the variability of arrays in the case of skew correlation, without 
any assumption as to the analytical form of the skew correlation surface. 

(ii.) It provides a nomenclature and classification of the types of array variability 
which may be of service. 

Arrays are either homoclitic or heteroclitic, according as their skewnesses are of 
equal magnitude or not. Arrays are further homoscedastic or heteroscedastic, 
according as their standard deviations are alike or different. Skew arrays are termed 
allocurtic; if arrays are symmetrical about their mean, they are tsocurtic. 

A heteroclitic system of arrays may be nomic or anomie, according as the skewness 
of the arrays changes continuously or irregularly with the position of the array. 

A heteroscedastic system of arrays is also either nomic or anomic, according as the 
standard deviation of the arrays changes continuously or irregularly with the 


* ‘Roy. Soe. Proc.,’ vol. 71, p. 308. 
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position of the arrays. Anomic heteroclisy and anomic heteroscedasticity probably only 
signify that our material is either heterogeneous or too sparse to free us from the 
large errors of random sampling in the extreme arrays. Still the terms will be 
found of use in describing the actual data. 

The curve in which the skewness of the array is plotted to its position is termed 
the clitic curve; the curve in which the ratio of the standard deviation of the array 
to the standard deviation of the character in the population at large is plotted to 
position is termed a scedastie curve. 

(iii.) The types of regression have been classified into linear, parabolic, cubic and 
quartic. For most practical purposes the first three suffice. Necessary criteria 
have been given for each case. But as in the case of the skew frequency of one 
character, an indefinite number of conditions ought theoretically to be fulfilled. 
Practically in dealing with frequency, no criteria are absolutely fulfilled, and the 
probable errors of the expressions used become unmanageable as we ascend in the 
scale. We must therefore be content to estimate the degree of approximation with 
which one or two necessary criteria are satisfied. 

The fundamental test of deviation from the familiar form of linear regression is the 
inequality of the correlation coefficient 7+ and the newly introduced correlation 
ratio 7. The probable error of this latter is determined. It is shown that 
o, /1 — 7” is the mean standard deviation of a system of arrays in skew correlation. 
The ease with which y can be calculated suggests that in many cases it should 
accompany, if not replace the determination of the correlation coefficient. 

In the determination of the constants of the regression curve we must use 
moments and product moments. The limitations to the order of the curve used 
depend: (a) on the labour of the arithmetic, (b) on the increasing probable errors of 
the higher moments and product moments. For these reasons it seems idle to propose 
going beyond the 6 to 8 moments, or the 3% to 5 product-moments. Practical 
experience suggests that little is to be gained by using moments beyond the 6", or 
product moments beyond the 3% A quartic regression curve may be useful 
occasionally, but it has yet to justify its necessity. As our object is not to repro- 
duce the given data, but to provide a graduation for them, which smooths down the 
errors of random sampling, we believe that any legitimate and practical theory must 
discard the high moments and high product moments with which THIELE and Lipps 
propose to deal. 

(iv.) There is one point to which reference ought to be made. Some reader may 
enquire why the method of my paper on curving fitting* should not be applied 
to these regression curves in general, as we have in practice once or twice 
already applied it. It would seem that that method is the easier, involving in the 
case of the quartic only quantities analogous to our 7, e, € and @. The answer is 


* “On the Systematic Fittings of Curves to Observations a d Measurements.” ‘Biometrika,’ 
vol. L., pp. 265-303, and vol. II., pp, 1-23, especially the latter, pp. 11-15. 
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straightforward : that process supposes every y,, to have equal weight, or n;, to be 
the same for each array. Hence the higher moments of the #-character, which are 
really involved, can be written down without. calculation once and for all.* The 
complexity of our present investigation arises from the introduction of the weighting 
into the calculation of the moments of the a-character, as well as into that of the 
product moments 7, ¢, ¢, 6. Our results therefore, although they might not look so 
good on a graph of the| regression curve, would be markedly better, if due weight 
were given to the frequency of each array. The difference of the two conceptions is 
comparable to the determination of the regression on the one hand from the 
correlation coefficient, and on the other from merely striking a line through the 
plotted means of the arrays. The method of moments in the present case, if we 
except the use of y, is identical with that of fitting a curve to a continuum in space 
by the method of least squares. 

(v.) No stress whatever is laid on the actual instances here selected for illustration of 
the methods of this paper. I have merely chosen out of available material cases in 
-which I had come across skew regression of various types. Thus we find :— 

(a.) The correlation of the number of branches and position of the whorl in 
Asperula odorata is practically parabolic, homoscedastic and of nomic heteroclisy. 

(b.) The correlation between auricular height of head and age in girls is cubical, 
of nomic heteroscedasticity and of anomic heteroclisy. It is probably really a case 
of isocurtosis. 

(c.) The correlation of size of cell and size of body in Daphnia magna, allowing 
for the irregularities produced by the ecdyses, is parabolic or cubic, of nomic 
heteroscedasticity, and probably, but for the above-mentioned irregularities, of 
isocurtic homoclisy. 

(d.) The correlation of the number of branches and position of the whorl in 
Equisetum arvense is cubical or possibly even quartic, of markedly nomic hetero- 
scedasticity and markedly nomic heteroclisy. 

It is not impossible that slips have occurred in the lengthy arithmetic involved, but 
every important piece of work has been done independently twice, once by Dr. ALICE 
Ler, whom I have most heartily to thank for her unwearying assistance, and once 
by myself. To preserve uniformity of working, the constants have in each case 
been carried to six figures. This involves little or no additional trouble, using as we 
do mechanical calculators. The final results are of course of no value beyond their 
probable errors, which will be in the second or third place of’ figures. No doubt I 
shall be told that there is a show of accuracy in the number of decimal figures 
retained, which does not really exist. It does not exist (and I am as fully conscious 
of its non-existance as any would-be critic) so far as our results fit the actual 
population, of which we have but a random sample. ‘The figures, however, are of 
importance, as far as testing accuracy of fit of result to actwal sample goes. The 

* ‘Biometrika,’ vol. IL, p. 12. 
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cubic or quartic curves may have coefficients insensible before the third or fourth 
figure of decimals, and these coefficients have to be multiplied occasionally by 
abscissee of the third or fourth powers of 7 to 9. Hence to get ordinates true, as 
far as the sample goes, to the second or third figure, we require to work to a fairly 
high number of figures. There is no magic in six figures, four or five would probably 
satisfy another worker, but they are easily read off the calculator we use, and if the 
constants had been tabled only to four or five, no reader would have been able to 
agree exactly, if he wished to test any of our results, even to three figures, with the 
final ordinates. 
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DIAGRAM Ill, SKEW CORRELATION BETWEEN SIZES OF CELL AND BODY IN DAPHNIA. 
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DIAGRAM V. SKEW CORRELATION BETWEEN BRANCHES AND POSITION OF WHORL IN EQUISETUM: 
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